Templates and a Clean Start

Before I get into the meat of the topic, which will eventually lead to a self-modifying grammar (yes, you heard me, self-modifying…) I have a confession to make, in that a series of articles on the old site may have led people astray. I wrote that series thinking to make parsing things where no grammar existed easier.

It may have backfired. So, as a penance, I’m simultaneously pointing theperlfisher.{com,net} to this new site, and starting a new series of articles on Perl 6 programming with a different approach. This time I’ll be incorporating more of my thoughts and what hopefully will be a different approach.

Begin as you mean to go on.

I would love to dump the CMS I’m currently using for something written in Perl 6. Among the many challenges that presents is displaying HTML, and to paraphrase Clint Eastwood, I do know my limitations. So, I don’t want to write HTML. Ideally, not ever.

So, that means steal borrowing HTML from other sites and making it my own. Since those are usually Perl 5 sites, that means dealing with Template Toolkit. And already I can hear some of you screaming “Perl 6 already handles everything TT used to! Just use interpolated here-docs!”

And, for the most part, you’re absolutely correct. Instead of the clunky ‘[% variable_name %]’ notation you can use clean inline interpolation with ‘{$variable-name}’, and being able to insert blocks of code inline means you don’t have to go through many of the hoops that you’re required to jump through with Template Toolkit.

That’s all absolutely true, and I hope to be able to use all of those features and more in the final CMS, whatever that happens to be. This approach ignores the fact that most HTML out there is written with Template Toolkit, and that rewriting HTML, even if it’s just a few tiny tags, is an investment of time that could be better done elsewhere.

If only there were Template Toolkit for Perl 6…

Let’s dive in!

If you’re not familiar with Template Toolkit, it’s a fairly lightweight programming language for writing HTML templates, among others. Please don’t confuse it with a markup language, designed to be rendered into HTML. This is a language that lets you combine your own code with a template and generate dynamic displays.

<h1>Hello, [% name %]!</h1>

That is a simple bit of Template Toolkit. Doesn’t look like much, does it? It’s obviously a fragment of a proper HTML document because there’s no ‘<html>’..'</html>’ bracketing it, and obviously whatever’s between ‘[%’ and ‘%]’ is being treated specially. In this case, it’s being rendered by an engine that fills in the name, maybe something like…

$tt.render( 'hello.tt', :name( 'Jeff' ) );

where hello.tt is the name of the template file containing the previous code, and ‘Jeff’ is the name we want to substitute. We’ve got a lot of work to go through before we can get there, though. If you’ve read previous articles of mine on the subject, please try to ignore what I’ve said there.

Off the Deep End

First things first, we need a package to work in. For this, I generally rely on App::Mi6 to do the hard work for me. Start by installing the package with zef, and then we’ll get down to business. (It should be installed by default, if you’re still using rakudobrew please don’t.)

$ zef install App::Mi6
{a bit of noise}
$ mi6 new Template::Toolkit
Successfully created Template-Toolkit
$ cd Template-Toolkit

Ultimately, we want this test (in t/01-basic.t – go ahead and add it) to pass:

use Test;
use Template::Toolkit;
my $tt = Template::Toolkit.new;
is $tt.render( 'hello.tt', :name( 'Jeff' ) ), '<h1>Hello, Jeff!</h1>';

It’ll fail (and miserably, at that) but at least it’ll give us a goal. Also it should give us an idea of how others will use our API. Let’s think about that for a few moments, just to make sure we’re not painting ourselves into any obvious corners.

In order to be useful, our module has to parse Perl 5 Template Toolkit files, and process them in a way that’s useful in Perl 6. Certain things will go by the wayside, to be sure, but the core will be a module that lets us load, maybe compile, and fill in a template.

Hrm, I just said ‘fill in’ rather than ‘render’, what I said above. Should I change the method name? No, not really, the new module will still do what the Perl 5 code used to, it just won’t do it using Perl 5, so some of the old conventions won’t work. Let’s leave that decision for now, and go on.

Retrograde is all the rage

Let’s apply some basic retrograde logic to what we’ve got here, given what we know of Perl 6 tools. In order to get the string ‘<h1>Hello, Jeff!</h1>’ from ‘<h1>Hello, [% name %]!</h1>’, we need a lot of mechanics at work.

At first glance, it seems pretty obvious that ‘[% name %]’ is a substitution marker, so let’s just do a quick regexp like this:

$text ~~ s:g{ '[%' (\w+) '%]' } = %args{$0};

That should replace every marker in the text with something from an %arguments hash that render() supplies to us. End of column, end of story. But not so fast, if all Template Toolkit supplied to us was the ability to substitute values for keys, then … there’s really no need for the module. And in fact, if you look at the docs, it can do many more things for us.

For example, ‘[% INCLUDE %]’ lets us include other template files in our own, ‘[% IF %]’ .. ‘[% END %]’ lets us do things conditionally, and a whole host of other “directives” are available. But you’ll see here the one thing they have in common is they all start with ‘[%’ and end with ‘%]’.

Hold the phone

That isn’t entirely true, and in fact there’s going to be another article in the series about that. But it’s a good starting point. We may not know much about what the language itself looks like, but I can tell you that tags are balanced, not nested, and every ‘[%’ opening tag has a ‘%]’ tag that closes it.

I’ll also point out that directives ( ‘[% foo %]’ ) can occur one after another without any intervening white space, and may not occur at all. So already some special cases are starting to creep in.

In fact, let’s put this in as a separate test file entirely. So separate that we’re going to put it in a nested directory, in fact. Let’s open t/parser/01-basic.t and add this set of tests:

use Test;
use Template::Toolkit::Parser;

my $p = Template::Toolkit::Parser.new;

0000, AAAA
0001, AAAB
0010, AABA
0011, AABB
0100, ABAA
0101, ABAB
... # and so on up to
1110, BBBA
1111, BBBB

Now just HOLD THE PHONE here… we’re testing directives for Template Toolkit, not binary numbers, and whatever that other column is! Well, that’s true. We want to test text and directives, and make sure that we can get back text when we want it, and directives when we want them.

At first blush you might think it’s just enough to make sure that ‘<h1> Hello,’ is parsed as text, and that ‘[% name %]’ is parsed as a directive, and just leave it at that. But those of you that have worked with regular expressions for a while might wonder how ‘[% name %][% other %]’ gets parsed… does it end at the first ‘%]’, or continue on to the next one?

And what about text mixed with directives? Leading? Trailing text? Wow, a lot of combinations. In fact, if you wanted to be thorough, it wouldn’t hurt to cover all possible combinations of text and directives up to… say, 4 in a row.

Let’s call text ‘T’, and directives ‘D’. I’ve got 4 slots, and only two choices for each. Filling the first slot gives me ‘T_ _ _’ and ‘D_ _ _’, for two choices. I can fill the next slot with ‘T T _ _’, ‘T D _ _’, ‘D T _ _’, and ‘D D _ _’, and I think you can see where we’re going with this.

In fact, replace T with 0 and D with 1, and you’ve got the binary numbers from 0000 to 1111. So, let’s take advantage of this fact, and do some clever editing in our editor of choice:

0010, 0010                            =>
is-deeply the-tree( '0010, AABA       =>
is-deeply the-tree( '0010' ), [ AABA  =>
is-deeply the-tree( '0010' ), [ AABA ];

A few quick search-and-replace commands should get you from the first line to the last line. Now it’s looking more like a Perl 6 test, right? We’re not quite there yet, ‘0010’ still doesn’t look like a string of text and directives, and what’s this AABA thing? One more search-and-replace pass, this time global, should solve that.

is-deeply the-tree( '0010' ), [ AABA ]; =>
is-deeply the-tree( 'xx1x' ), [ AABA ]; =>
is-deeply the-tree( 'xx[% name %]x' ), [ AABA ]; =>
is-deeply the-tree( 'xx[% name %]x' ), [ 'a', 'a', B'a', ]; =>
is-deeply the-tree( 'xx[% name %]x' ),
          [ 'a', 'a', B'a', ]; =>
is-deeply the-tree( 'xx[% name %]x' ),
    [ 'a', 'a', Directive.new( :content( 'name' ) ), 'a', ];

Starting out with the padded binary numbers covers every combination of text and directive possible (at least 4 long). A clever bit of search-and-replace in your favorite editor gives us a working set of test cases that check a set of “real-world” strings, and a file you can almost run. Next time we’ll fill in the details, and get from zero to a minimal (albeit working) Template Toolkit implementation.

As always, dear reader, feel free to post whatever comments, questions, and/or suggestions that you may have, including ideas for future articles. I read and respond to every comment, and thank you for your time.

Logic Programming in Perl 6

This is a small example of conference-driven development. I’m sitting in the board room at TPCiP – TCP in Pittsburgh surrounded by people doing both Perl 5 and Perl 6 programming, and decided to look again at Picat, working on some simple examples. I was thinking that I might be able to translate some of the simpler backtracking examples from Picat to Perl 6, and here’s a simple example.

First the Picat code:

fib(0,F) => F=1.
fib(1,F) => F=1.
fib(N,F),N>1 => fib(N-1,F1),fib(N-2,F2),F=F1+F2.

Now here’s my equivalent Perl 6 code:

multi fib( 0, $F is rw ) { $F = 1 }
multi fib( 1, $F is rw ) { $F = 1 }
multi fib( $N is rw where * > 1, $F is rw ) {
  my ( $F1, $F2 ) = 0, 0;
  my $N1 = $N - 1;
  my $N2 = $N - 2;
  fib( $N1, $F1 ) && fib( $N2, $F2 ) && $F = $F1 + $F2
}

The Perl 6 version is slightly larger because I need to declare some variables that Picat would ordinarily declare for me ($F1, $F2). There may be a way to work around declaring ($N1, $N2), but otherwise the two versions are identical.

How does it work?

You’ve probably guessed based on the inputs that N is index of the Fibonacci number, and F is the Fibonacci number itself. Picat doesn’t require you to declare variables, so you could ask it for the 7th Fibonacci number by calling fib(7,F) and looking at F.

my $Fib = 0;
fib(7,$Fib);
say $Fib     # 21

Or you could do the above in Perl 6, letting the code populate $Fib for you. This code relies on the fact that Perl 6 lets you dispatch not just on signatures, not just on argument types, but on values. Look at the base case above:

multi fib( 0, $F is rw ) { ... }

fib(…) is the function signature, and this function will get called whenever the first argument is 0, like so: fib(0, $Fib). This happens even if ‘multi fib( $N, $F )’ is the one doing the calling, everything gets run through the same dispatcher each time.

So fib(2, $Fib) calls fib(1, $Fib) which calls ‘multi fib( 1, $F )’ and gives us a base case, for example. This lets the higher-order functions call our base cases, and still get the right value.

What are we missing?

Well, the Picat code can do something the Perl 6 code can’t, at least for the moment, and this is what I want to spend some time working on. In Picat, I can call ‘fib(6,F)’ and F will be 13 when the code is done. This works in Perl 6 too.

But Picat will also let you call ‘fib(N,21)’ and N will be 7 when the calculation is finished. Take some time to let that settle. Yes, you can run the calculation both forward and backward. Give N a value, and F will be the Nth Fibonacci number. Give F a value, and it will tell you what N is.

In fact, Picat will go one step further. If you don’t specify a value for either parameter but just specify variables, like ‘fib(N,F)’, then it will generate all the Fibonacci numbers and their indexes until you tell it to stop.

This is because of the backtracking engine that it uses, which I want to see if I can mimick. ‘F=F1+F2’ doesn’t mean “Assign the sum of F1 and F2 to F”. Instead, “If any values are missing, find values that satisfy the equation, and keep generating them until you run out of possibilities.”

That’s a bit of a mouthful, so let’s look at just F1. Supposing F=8 and F2=5, the backtracking engine would search all values of F1, and return just the matching value of 3. Now of course, it can’t search all values, because that means you’d be waiting forever, so there are pruning algorithms at work here.

But the same logic can work with any combination of arguments, so if both F1 and F2 were missing, then the backtracking engine would run through all possible combinations of values (pruned appropriately) until it found a combination that would work.

In this case, since in our example F=8, it would return a bunch of combinations, starting with (F1=1, F2=7), (F1=2, F2=6) and so on. But why, then, you ask, does it only return (F1=3, F2=5)? That’s because each value F1 also has to satisfy fib(N1,F1), which means that F1 has to be a Fibonacci number, as does F2.

Breakdown

This is the part where Perl 6 breaks down a little bit. But what I think I might be able to do is use a trick I used a while ago, relying on the fact that operators are just functions, and they dispatch just like other functions. So I should be able to start out with something crude like:

my $F = Operator.new( :lhs(3) );
my Value ($F1, $F2);
$F = $F1 + $F2;

This way both $F1 and $F2 are bound to backtracking Values, they return a Operator, and the Operator is part of the backtracking engine. This way once the Operator engine determines the range of possible combinations of $F1 and $F2 that add to 3, it can assign them concurrently to @F1.value and @F2.value.

Smooth Operators

The Operator and Value classes, along with their overloaded operators, would look something like this:

class Operator {
  has ( $.lhs, $.rhs ); }
class PlusOperator is Operator { }
class AssignmentOperator is Operator {
  method make-combinations() {...} }
class Value {
  has @.value }
multi infix:<=>( Operator $lhs, Operator $rhs ) {
  AssignmentOperator.new( :$lhs, :$rhs );
}
multi infix:<+>( Value $lhs, Value $rhs ) {
  PlusOperator.new( :$lhs, :$rhs );
}

This is purely a sketch that I haven’t tried out at all. My idea here is that once you’ve executed ‘$F = $F1 + $F2’, $F will be an AssignmentOperator instance. You should be able to call $F.make-combinations() that will solve the equation ‘3 = $F1 + $F2’ for all (constrained) values of $F1 and $F2.

That would populate @F1.values and @F2.values with (1,2) and (2,1) respectively. I’m about to play my first game of Azul, so I’ll leave the article here. The next article will hopefully implement this so you can see this all working. It won’t quite be a true backtracking engine, but it’s a start.

Dear Reader, thank you for your attention, and please feel free to add comments, questions and suggestions.

Quantum Tunneling

Introducing the new Perl Fisher site

Before I get on to the meat of the article, welcome to the new home of The Perl Fisher. I intend to cover both Perl 5 and Perl 6 programming here, but it’ll be mostly Perl 6 content because that’s the language I find the most fun. Please excuse the dust, I’m still very much settling into the new home, and the overall look of the site is bound to change while I play with the new toys available to me.

Defeating Thanos with Perl 6

Don’t worry, no spoilers here. We’re just going to talk about a little-known feature of Perl, the quantum-tunneling variable type. If you’ve worked with Perl 6 for any length of time, you’ve probably seen or written a class declaration that looks something like below.

class Point2D {
  has Real $.x;
  has Real $.y;
}

While the word ‘has’ does the real work, second-sigil syndrome strikes as well, in the shape of the ‘.’ between the scalar sigil ‘$’ and the variable name. Here it’s syntactical sugar for being an attribute name, but we can enlarge that ‘.’ to a ‘*’ and open up a world of possibilities.

When we add the ‘*’ sigil to a variable name, we turn that variable into one that can quantum tunnel between scopes and solve problems that you probably used to do with a global variable. You can read more about dynamic variables and how they differ from ordinary globals at The_*_twigil at docs.perl6.org.

Testing, testing

I’m working on a project to try to augment the Perl 6 grammar debugger with an emulator. The tools we have on CPAN and modules.perl6.org respectively are wonderful, but they’re limited because Perl 6 compiles grammar rules down to single methods, which is wonderful for speed, but makes it almost impossible to look into.

The grammar I’m writing isn’t important at the moment, but the testing part is. Below is a sample subtest that I’m writing for each term of a grammar that’s probably going to have ~50 terms by the time I’m done.

subtest 'binary-number', {
  subtest 'failing', {
    ok fails( '0b', 'binary-number' );
    ok fails( '3g', 'binary-number' );
  };

  is build-ast( '0101', 'binary-number' ), 5;
};

This tests the ‘binary-number’ rule to see if it properly fails on ‘0b’ and ‘3g’. ‘0b’ fails because it’s the prefix of a binary number, and ‘3g’ because neither 3 nor ‘g’ are binary digits. It also makes certain that ‘0101’ gets translated into the decimal number 5. All important when testing a grammar that parses … well, itself eventually. Oroborous redux, as it were.

Dry up, will you…

The test is simple, and straightforward. ‘0b’ should fail, ‘0101’ should be built into a node of an abstract syntax tree. But it’s got some flaws. It talks too much. See how ‘binary-number’ repeats itself? If I want to copy that, rename it to ‘hex-number’ and add a few changes, I have to copy the block, rename all the incidences of ‘binary-number’ to ‘hex-number’ and then fix the existing tests.

Thus I run the risk of forgetting to update the name ‘binary-number’. And there’s an even greater bugaboo there. If I don’t, the test won’t fail. Because the subtest doesn’t know that it’s supposed to be testing the ‘binary-number’ rule. There are a bunch of ways to solve this problem, of course, but for this post we’re going to use Ant-Man(tm).

Entering the Quantum Realm

I don’t want to do too much work here, I just want to get rid of the duplicate ‘binary-number’ entries. So, let’s take a look at what fails() does.

sub fails( Str $sample, Str $rule-name ) returns Bool {
  !?( $g.parse( $sample, :rule( $rule-name ) );
}

The ‘!?(…)’ casts $g.parse(…) to a Boolean and negates it, so if $g can’t parse the statement, it returns True. So, first let’s make $rule-name optional.

sub fails( Str $sample, Str $rule-name? ) returns Bool {
  !?( $g.parse( $sample, :rule( $rule-name ) );
}

Opening the wormhole

Now, we’re going to summon Ant-Man(tm). Remember earlier I mentioned that quantum variables use a wormhole? Well, we’re going to open one end of the wormhole right here in our fails() function, just like this.

sub fails( Str $sample, Str $rule-name? ) returns Bool {
  !?( $g.parse( $sample, :rule( $*ANT-MAN // $rule-name ) );
}

Rerun our tests, and … wait, they should fail, we haven’t declared $*ANT-MAN anywhere! Well, just like in quantum physics, $*ANT-MAN doesn’t have enough energy to tunnel over the quantum barrier because we haven’t defined him yet.

So let’s do that, but remember that $*ANT-MAN is a quantum variable, so he can tunnel through the quantum barrier of a function scope. In fact, he can tunnel through any number of them. So, let’s define a new version of subtest() that looks and acts like the old one first before we go boldly where no Perl 6 programmer has gone before.

sub Subtest( Str $rule-name, Block $test-code ) {
  subtest $rule-name, $test-code;
}

We should be able now to replace the outer subtest() block with our new Subtest() block, and it should act just as it used to.

Subtest 'binary-number', {
  subtest 'failing', {
    ok fail( '0b', 'binary-number' );
    ...
  };
  ...
};

Tunneling through

Our test suite still works, and the output still is what we expect. Now, let’s give $*ANT-MAN enough energy to tunnel through the quantum barrier by defining him as the subtest name we want:

sub Subtest( Str $rule-name Block $test-code ) {
  my $*RULE-NAME = $rule-name;
  subtest $rule-name, $test-code;
}

And now run our test suite. Which… doesn’t change. Come to think of it, we don’t want it to change. If it did change, we’d have to go through and change all of our test suites, which would be bad. So, putting things together, this code works just fine.

sub fails( Str $sample, Str $rule-name? ) returns Bool {
  !?( $g.parse( $sample, :rule( $*ANT-MAN // $rule-name ) );
}
sub Subtest( Str $rule-name Block $test-code ) {
  my $*RULE-NAME = $rule-name;
  subtest $rule-name, $test-code;
}

Subtest 'binary-number', {
  subtest 'failing', {
    ok fails( '0b', 'binary-number' );
  };
};

Notice by the way that $*ANT-MAN has tunneled through not one but two function signatures to get to where he is. And to prove it, finally, delete the inside ‘binary-number’.

Subtest 'binary-number', {
  subtest 'failing', {
    ok fails( '0b' );
  };
};

So ‘binary-number’ gets passed along to $*ANT-MAN who jumps into the quantum realm, tunnels through the outer and inner pairs of braces, and finally lands in fails() where he passes the value on to the :rule() declaration. Whew, that’s a lot of work.

Oh, snap.

(sorry, couldn’t resist.) If you’re like me, and I know I am, you’ve probably come across a few cases where this technique would come in handy. Especially if you’re dealing with legacy code. Sometimes you need to add just one little flag to a function and set that flag in a top-level handler on one page of a website.

The catch is that between the lower level and the top level there’s a chain of 8 function calls where you have to add that as a parameter. Wouldn’t it be nice if there was a workaround? Well, in Perl 6 there is.

Thanks for getting all the way to the bottom of this, my inaugural article on Perl 6 on the next generation of the Perl Fisher website. Feel free to leave comments and constructive criticism in the comments section below, and come back every so often to watch the website grow over the coming months.

Spacing Out

After having had some comments about the grammar approach I’ve been using, I’ve started to rethink things. I may have isolated at least one problem people may have been having. I’m working on a grammar for a language called ‘picat’ – you can look up a quick explanation at picat.org.

It’s a constraint-based programming language that maps insanely well onto Perl 6. A fragment of the grammar I’m working on follows, done in a top-down fashion. The actual grammar rule <comment> isn’t the important thing, because this problem can occur with anything.

If you must know, it’s a C-style /* .. */ comment. Of course I ran the test to make sure this little block of code properly matched beforehand. This way I could go along making one small change at a time, simple because it’s fairly late at night and I’ve got a flight to catch tomorrow..

<comment>
<comment>
<comment>

'go' '=>'
   'doors(10).'

Breaking up is hard to do

The natural thing to do here is, of course, say to yourself “Hrm, I’ve got 3 <comment> comment blocks in a row. We all know there are only 3 important numbers in computer science, 0, 1, and Infinity. So 3 is wrong and should be replaced with <comment>+.”

<comment>+

'go' '=>'
   'doors(10).'

I then rerun the test, because I’m sticking to my nighttime rule of “one change, one retest”, and to my horror it breaks. I’ve only changed one thing, but … why is it breaking? Surely <A>+ should at least match <A> <A> <A> … that’s how DFA equivalences work in finite automata.

That’s also one point where Perl 6 and traditional DFAs (Deterministic Finite Automata) part ways. After a few years of doing Perl 6 programming, I see Perl 6 as almost overly helpful. Tools like flex and bison made me think of grammars as something that belonged outside the language.

Where it all breaks down

Unfortunately modules like Grammar::Debugger, through no fault of their own, can’t quite help here. While it’s a great module to tell you what particular rule or token failed, the problem here is between the terms.

<A> {whitespace-optional} <A> is subtly different than <A>+ because <A> <A> lets the parser read whitespace between the two terms; <A>+ assumes the terms come one after the other, whitespace be darned.

So, the simplest solution I have to offer is to let the comment eat the whitespace after it as well, so you can insert your <comment> token anywhere you like and it’ll still eat the whitespace no matter how you add it.

Another solution proposed on Reddit would be to use <A>˽+, with a space between the closing ‘>’ and the modifier. Said user went beyond the call of duty and composed a “Seven stages of whitespace” post to make the point.

The <comment> token I have, like I said, is for C/C++ style “balanced” comments. Here they’re not balanced; /* This is a comment */ but this is not */, and/* This is a comment /* so is this */ this looks like it should but really isn’t. */

token comment
  {
  '/*' .+? '*/' \s*
  }

And all is well with the grammar. You can put this rule anywhere you like and it’ll behave whether you write <comment> <comment> or <comment>+. This little article was inspired by a Twitter user inspired after reading my first tutorial series. They got into the actual work of creating a grammar and problems started to happen.

Wrapping up

My original tutorial series was just that, a tutorial, I felt that getting too deep into the process interrupts the flow, so I didn’t talk about the work that went into it. Now that the series is pretty much done, I think it’ll be beneficial to talk about the actual problems of debugging one of these beasts.

And these thing can most definitely be beasts. Using my ANTLR4 to Perl 6 converter you can generate some incredibly huge grammars. But just generating them doesn’t necessarily mean they’ll compile, although a few do right out of the box, which I’m genuinely amazed at.

The full test suite actually chooses a few grammars, converts them to Perl 6, compiles them and tests against sample input. I’m not sure how faithful they are to the real grammar, but they work.

Perl 6 does amazing things with precompiling and JITing. Grammars and regular expressions are one of the hardest-working things in Perl 6, so they get compiled down to functions. This means I can’t step into them even inside NQP, the dark side of Perl 6.

I’ve got ideas, so I’m going to keep working on grammar stuff. That means when I run into problems, well, it’s time to write another article. So look forward to a new series. Likely with a prosaic name of “Perl 6 Grammars Debun^wDebugged” or something similar. Thank you again, dear reader. Comments, clarifications and questions are of course welcome.