Templates and a Clean Start

Before I get into the meat of the topic, which will eventually lead to a self-modifying grammar (yes, you heard me, self-modifying…) I have a confession to make, in that a series of articles on the old site may have led people astray. I wrote that series thinking to make parsing things where no grammar existed easier.

It may have backfired. So, as a penance, I’m simultaneously pointing theperlfisher.{com,net} to this new site, and starting a new series of articles on Raku programming with a different approach. This time I’ll be incorporating more of my thoughts and what hopefully will be a different approach.

Begin as you mean to go on.

I would love to dump the CMS I’m currently using for something written in Raku. Among the many challenges that presents is displaying HTML, and to paraphrase Clint Eastwood, I do know my limitations. So, I don’t want to write HTML. Ideally, not ever.

So, that means steal borrowing HTML from other sites and making it my own. Since those are usually Perl 5 sites, that means dealing with Template Toolkit. And already I can hear some of you screaming “Raku already handles everything TT used to! Just use interpolated here-docs!”

And, for the most part, you’re absolutely correct. Instead of the clunky ‘[% variable_name %]’ notation you can use clean inline interpolation with ‘{$variable-name}’, and being able to insert blocks of code inline means you don’t have to go through many of the hoops that you’re required to jump through with Template Toolkit.

That’s all absolutely true, and I hope to be able to use all of those features and more in the final CMS, whatever that happens to be. This approach ignores the fact that most HTML out there is written with Template Toolkit, and that rewriting HTML, even if it’s just a few tiny tags, is an investment of time that could be better done elsewhere.

If only there were Template Toolkit for Raku…

Let’s dive in!

If you’re not familiar with Template Toolkit, it’s a fairly lightweight programming language for writing HTML templates, among others. Please don’t confuse it with a markup language, designed to be rendered into HTML. This is a language that lets you combine your own code with a template and generate dynamic displays.

<h1>Hello, [% name %]!</h1>

That is a simple bit of Template Toolkit. Doesn’t look like much, does it? It’s obviously a fragment of a proper HTML document because there’s no ‘<html>’..'</html>’ bracketing it, and obviously whatever’s between ‘[%’ and ‘%]’ is being treated specially. In this case, it’s being rendered by an engine that fills in the name, maybe something like…

$tt.render( 'hello.tt', :name( 'Jeff' ) );

where hello.tt is the name of the template file containing the previous code, and ‘Jeff’ is the name we want to substitute. We’ve got a lot of work to go through before we can get there, though. If you’ve read previous articles of mine on the subject, please try to ignore what I’ve said there.

Off the Deep End

First things first, we need a package to work in. For this, I generally rely on App::Mi6 to do the hard work for me. Start by installing the package with zef, and then we’ll get down to business. (It should be installed by default, if you’re still using rakudobrew please don’t.)

$ zef install App::Mi6
{a bit of noise}
$ mi6 new Template::Toolkit
Successfully created Template-Toolkit
$ cd Template-Toolkit

Ultimately, we want this test (in t/01-basic.t – go ahead and add it) to pass:

use Test;
use Template::Toolkit;
my $tt = Template::Toolkit.new;
is $tt.render( 'hello.tt', :name( 'Jeff' ) ), '<h1>Hello, Jeff!</h1>';

It’ll fail (and miserably, at that) but at least it’ll give us a goal. Also it should give us an idea of how others will use our API. Let’s think about that for a few moments, just to make sure we’re not painting ourselves into any obvious corners.

In order to be useful, our module has to parse Perl 5 Template Toolkit files, and process them in a way that’s useful in Raku. Certain things will go by the wayside, to be sure, but the core will be a module that lets us load, maybe compile, and fill in a template.

Hrm, I just said ‘fill in’ rather than ‘render’, what I said above. Should I change the method name? No, not really, the new module will still do what the Perl 5 code used to, it just won’t do it using Perl 5, so some of the old conventions won’t work. Let’s leave that decision for now, and go on.

Retrograde is all the rage

Let’s apply some basic retrograde logic to what we’ve got here, given what we know of Raku tools. In order to get the string ‘<h1>Hello, Jeff!</h1>’ from ‘<h1>Hello, [% name %]!</h1>’, we need a lot of mechanics at work.

At first glance, it seems pretty obvious that ‘[% name %]’ is a substitution marker, so let’s just do a quick regexp like this:

$text ~~ s:g{ '[%' (\w+) '%]' } = %args{$0};

That should replace every marker in the text with something from an %arguments hash that render() supplies to us. End of column, end of story. But not so fast, if all Template Toolkit supplied to us was the ability to substitute values for keys, then … there’s really no need for the module. And in fact, if you look at the docs, it can do many more things for us.

For example, ‘[% INCLUDE %]’ lets us include other template files in our own, ‘[% IF %]’ .. ‘[% END %]’ lets us do things conditionally, and a whole host of other “directives” are available. But you’ll see here the one thing they have in common is they all start with ‘[%’ and end with ‘%]’.

Hold the phone

That isn’t entirely true, and in fact there’s going to be another article in the series about that. But it’s a good starting point. We may not know much about what the language itself looks like, but I can tell you that tags are balanced, not nested, and every ‘[%’ opening tag has a ‘%]’ tag that closes it.

I’ll also point out that directives ( ‘[% foo %]’ ) can occur one after another without any intervening white space, and may not occur at all. So already some special cases are starting to creep in.

In fact, let’s put this in as a separate test file entirely. So separate that we’re going to put it in a nested directory, in fact. Let’s open t/parser/01-basic.t and add this set of tests:

use Test;
use Template::Toolkit::Parser;

my $p = Template::Toolkit::Parser.new;

0000, AAAA
0001, AAAB
0010, AABA
0011, AABB
0100, ABAA
0101, ABAB
... # and so on up to
1110, BBBA
1111, BBBB

Now just HOLD THE PHONE here… we’re testing directives for Template Toolkit, not binary numbers, and whatever that other column is! Well, that’s true. We want to test text and directives, and make sure that we can get back text when we want it, and directives when we want them.

At first blush you might think it’s just enough to make sure that ‘<h1> Hello,’ is parsed as text, and that ‘[% name %]’ is parsed as a directive, and just leave it at that. But those of you that have worked with regular expressions for a while might wonder how ‘[% name %][% other %]’ gets parsed… does it end at the first ‘%]’, or continue on to the next one?

And what about text mixed with directives? Leading? Trailing text? Wow, a lot of combinations. In fact, if you wanted to be thorough, it wouldn’t hurt to cover all possible combinations of text and directives up to… say, 4 in a row.

Let’s call text ‘T’, and directives ‘D’. I’ve got 4 slots, and only two choices for each. Filling the first slot gives me ‘T_ _ _’ and ‘D_ _ _’, for two choices. I can fill the next slot with ‘T T _ _’, ‘T D _ _’, ‘D T _ _’, and ‘D D _ _’, and I think you can see where we’re going with this.

In fact, replace T with 0 and D with 1, and you’ve got the binary numbers from 0000 to 1111. So, let’s take advantage of this fact, and do some clever editing in our editor of choice:

0010, 0010                            =>
is-deeply the-tree( '0010, AABA       =>
is-deeply the-tree( '0010' ), [ AABA  =>
is-deeply the-tree( '0010' ), [ AABA ];

A few quick search-and-replace commands should get you from the first line to the last line. Now it’s looking more like a Raku test, right? We’re not quite there yet, ‘0010’ still doesn’t look like a string of text and directives, and what’s this AABA thing? One more search-and-replace pass, this time global, should solve that.

is-deeply the-tree( '0010' ), [ AABA ]; =>
is-deeply the-tree( 'xx1x' ), [ AABA ]; =>
is-deeply the-tree( 'xx[% name %]x' ), [ AABA ]; =>
is-deeply the-tree( 'xx[% name %]x' ), [ 'a', 'a', B'a', ]; =>
is-deeply the-tree( 'xx[% name %]x' ),
          [ 'a', 'a', B'a', ]; =>
is-deeply the-tree( 'xx[% name %]x' ),
    [ 'a', 'a', Directive.new( :content( 'name' ) ), 'a', ];

Starting out with the padded binary numbers covers every combination of text and directive possible (at least 4 long). A clever bit of search-and-replace in your favorite editor gives us a working set of test cases that check a set of “real-world” strings, and a file you can almost run. Next time we’ll fill in the details, and get from zero to a minimal (albeit working) Template Toolkit implementation.

As always, dear reader, feel free to post whatever comments, questions, and/or suggestions that you may have, including ideas for future articles. I read and respond to every comment, and thank you for your time.

Spacing Out

After having had some comments about the grammar approach I’ve been using, I’ve started to rethink things. I may have isolated at least one problem people may have been having. I’m working on a grammar for a language called ‘picat’ – you can look up a quick explanation at picat.org.

It’s a constraint-based programming language that maps insanely well onto Raku. A fragment of the grammar I’m working on follows, done in a top-down fashion. The actual grammar rule <comment> isn’t the important thing, because this problem can occur with anything.

If you must know, it’s a C-style /* .. */ comment. Of course I ran the test to make sure this little block of code properly matched beforehand. This way I could go along making one small change at a time, simple because it’s fairly late at night and I’ve got a flight to catch tomorrow..

<comment>
<comment>
<comment>

'go' '=>'
   'doors(10).'

Breaking up is hard to do

The natural thing to do here is, of course, say to yourself “Hrm, I’ve got 3 <comment> comment blocks in a row. We all know there are only 3 important numbers in computer science, 0, 1, and Infinity. So 3 is wrong and should be replaced with <comment>+.”

<comment>+

'go' '=>'
   'doors(10).'

I then rerun the test, because I’m sticking to my nighttime rule of “one change, one retest”, and to my horror it breaks. I’ve only changed one thing, but … why is it breaking? Surely <A>+ should at least match <A> <A> <A> … that’s how DFA equivalences work in finite automata.

That’s also one point where Raku and traditional DFAs (Deterministic Finite Automata) part ways. After a few years of doing Raku programming, I see Raku as almost overly helpful. Tools like flex and bison made me think of grammars as something that belonged outside the language.

Where it all breaks down

Unfortunately modules like Grammar::Debugger, through no fault of their own, can’t quite help here. While it’s a great module to tell you what particular rule or token failed, the problem here is between the terms.

<A> {whitespace-optional} <A> is subtly different than <A>+ because <A> <A> lets the parser read whitespace between the two terms; <A>+ assumes the terms come one after the other, whitespace be darned.

So, the simplest solution I have to offer is to let the comment eat the whitespace after it as well, so you can insert your <comment> token anywhere you like and it’ll still eat the whitespace no matter how you add it.

Another solution proposed on Reddit would be to use <A>˽+, with a space between the closing ‘>’ and the modifier. Said user went beyond the call of duty and composed a “Seven stages of whitespace” post to make the point.

The <comment> token I have, like I said, is for C/C++ style “balanced” comments. Here they’re not balanced; /* This is a comment */ but this is not */, and/* This is a comment /* so is this */ this looks like it should but really isn’t. */

token comment
  {
  '/*' .+? '*/' \s*
  }

And all is well with the grammar. You can put this rule anywhere you like and it’ll behave whether you write <comment> <comment> or <comment>+. This little article was inspired by a Twitter user inspired after reading my first tutorial series. They got into the actual work of creating a grammar and problems started to happen.

Wrapping up

My original tutorial series was just that, a tutorial, I felt that getting too deep into the process interrupts the flow, so I didn’t talk about the work that went into it. Now that the series is pretty much done, I think it’ll be beneficial to talk about the actual problems of debugging one of these beasts.

And these thing can most definitely be beasts. Using my ANTLR4 to Raku converter you can generate some incredibly huge grammars. But just generating them doesn’t necessarily mean they’ll compile, although a few do right out of the box, which I’m genuinely amazed at.

The full test suite actually chooses a few grammars, converts them to Raku, compiles them and tests against sample input. I’m not sure how faithful they are to the real grammar, but they work.

Raku does amazing things with precompiling and JITing. Grammars and regular expressions are one of the hardest-working things in Raku, so they get compiled down to functions. This means I can’t step into them even inside NQP, the dark side of Raku.

I’ve got ideas, so I’m going to keep working on grammar stuff. That means when I run into problems, well, it’s time to write another article. So look forward to a new series. Likely with a prosaic name of “Raku Grammars Debun^wDebugged” or something similar. Thank you again, dear reader. Comments, clarifications and questions are of course welcome.