Rewriting Perl Code for Raku IV: A New Hope

Back in Part III of our series on Raku programming, we talked about some of the basics of OO programming. This time we’ll talk about another aspect of OO programming. Perl objects can be made from any kind of reference, although the most common is a hash. I think Raku objects can do the same, but in this article we’ll just talk about hash-style Perl objects.

Raku objects let you superclass and subclass them, instantiate them, run methods on them, and store data in them. In previous articles we’ve talked about all but storing data. It’s time to remedy that, and talk about attributes.

Instance attributes

We used unit class OLE::Storage_Lite; to declare our class, and method save( $x, $y ) { ... } to create methods. Or in our case rewrite existing functions into methods. Now, we focus our attention on some of the variables that should really be instance attributes, and why.

Let’s get to know which variables behave like attributes, and which don’t. This will change how we write our Raku code, but hopefully for the better. We’ll start from the outside in, and look at the API. There are a few “test” scripts that use the module, and this fragment is pretty common.

use OLE::Storage_Lite;
my $oOl = OLE::Storage_Lite->new('test.xls');
my $oPps = $oOl->getPpsTree(1);
die( "test.xls must be a OLE file") unless($oPps);

The author creates an object ($oOl) from an existing file, then fetches a tree of “Pps” objects, whatever they are. So, one OLE::Storage_Lite object equals one file. This gives me my first instance variable, the filename.

sub new($$) {
  my($sClass, $sFile) = @_;
  my $oThis = {
    _FILE => $sFile,
  };
  bless $oThis;
  return $oThis;
}

Above is how they wrote it in Perl, and below is how we’d write it (exactly as specified) in Raku:

has $._FILE;

multi method new( $sFile ) {
  self.new( _FILE => $sFile );
}

Later on, we can call my $file = OLE:Storage_Lite.new( 'test.xls' ); just like we did in Perl. We wouldn’t even need the new method if we had users call my $file = OLE::Storage_Lite.new( _FILE => 'text.xls' );. This gives users the option of calling the API in the old Perl fashion or the new Raku fashion without additional work on our part.

Strict Raku-style

There’s a problem lurking here, though. The constructor Raku provides us lets us call my $file = OLE::Storage_Lite.new(); without specifying a value for $._FILE. If you know Perl’s Moose module, though, the ‘has’ there just might look familiar.

And for good reason. A lot of the ideas from Moose migrated into Raku during its design, and the attributes were one of those. Moose lets you do a lot of things with attributes, and so does Raku. One of those is you can add “adverbs” to them. Let’s do that now.

has $._FILE is required;

Calling OLE::Storage_Lite.new() now fails, because you’re not passing in the _FILE argument. That solves one problem. Actually, it solves two, come to think of it. In the original Perl code, you could call OLE:Storage_Lite->new() too, and it wouldn’t complain. Now we’ve fixed that, with one new term.

Progressive Typing

No, we’re not talking about some new editor like Comma (the link does work, despite the certificate problem.) Our code would run just fine, as-is. Users could call our .new() API, Raku would make sure the filename existed, and we could go on with translating.

But there’s something more we can take advantage of here, and that is the fact that any Raku object (and anything we can instantiate is an object) is a type as well. We haven’t mentioned that because we really couldn’t use that information until now.

The original Perl code is littered with clues to types, hidden in the variable names. When we wrote our own API call, the Perl code called the file name $sNm. The ‘s’ tells the Perl compiler nothing, but it tells us that $sNm is a String type. Perl may not have true types, but Raku does. Let’s fix our attribute with that in mind.

has Str $._FILE is required;

We knew all along that $._FILE is a string of some sort, but telling Raku that lets it allocate space more efficiently. Making sure it’s a required attribute lets anyone that calls new() know if they forget an argument. We could go a little farther with this, but locking down attributes will help in the long run, when we start dealing with the pack and unpack built-ins.

Packing It All In

We’re now getting to the heart of the module. There’s a lot of mechanics above us, allocating objects and doing math and checking types, and not much below us. The class’ entire purpose is to read and write OLE-formatted files. We’ll talk more about the boilerplate, but here’s the real meat of the file.

Let’s start with what should be simple, reading in data. Just like in Perl, we open a file and get back a “file handle” (assuming the file exists, of course.) By default, calling my $fh = open $._FILE; gives us a read-only file handle. The file handle itself has a bunch of attributes associated with it, but the important one right now is its encoding.

Namely, the fact that it has none. An OLE file is essentially a miniature filesystem (probably based on FAT) packed onto disk, complete with a root directory, subdirectories and files. File have names encoded in UCS-2, but the rest is entirely dependent upon what the application requires.

The upshot of which is that we can’t read the format with something simple like my @lines = $fh.lines; which would read line after line into the @lines array. Instead we’ll use calls like read() and write() that return byte-oriented buffers.

Buffering…

All OLE files start off with the header “\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1”, so we should probably start there. That’s important twice in the code, in fact. First, when we’re reading off disk, we can check it against what we’ve just read to make sure this file is OLE, and not, say, a JSON file. Later on, when we’re saving out an OLE file, we can write it as the header string.

constant HEADER-ID = "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1";

I’ll make it a constant as well, so when I revisit this code in a month I don’t have to go looking in specs for ‘0xd0 0xcf’ to remember what this is. Reading is straight-forward too. It needs just a byte count.

my Buf $header = $fh.read( 8 );

Something important to notice here is the type, ‘Buf’. If our file was in Markdown, or JSON we could get away with just writing my @lines = $fh.lines; like I tried earlier. But these are raw bytes, hindered by no interpretation. Let’s see what happens when we compare these bytes to our HEADER-ID.

t/01-internals.t ............ Cannot use a Buf as a string, but you called the Stringy method on it
  in method _getHeaderInfo at /home/jgoff/GitHub/drforr/raku-OLE-Storage_Lite/lib/OLE/Storage_Lite.pm6 (OLE::Storage_Lite) line 169
  in block <unit> at t/01-internals.t line 42

Another brick in the wall

Ka-blam. But… hold the phone here a minute, I just said $header eq HEADER-ID, I didn’t write anything like ‘Stringy’! There’s no ‘Stringy’ in the source… oh. HEADER-ID is a string, so Raku is being helpful. I’m trying to use string comparison (‘eq’) between something that’s not a Str ( $header ) and something that is (HEADER-ID).

Pull up the Stringy documentation, and look for the Type graph. Midway down you’ll see ‘Buf’ and ‘Str’, as of this writing Buf is on the left, and Str is popular so it’s in the middle.

Trace the inheritance paths from Buf and Str upwards, and you’ll see they pass Buf -> Blob -> Stringy and Str -> Stringy, and stop. What the error message therefore is saying is this, anthropomorphized:

You wanted to convert Buf to Str, and didn’t care how you did it. So I looked. First, on the Buf type. No .Str method there, at least without arguments. No good. So I looked in its parent, Blob. Nothing doing there. Then I looked at Stringy, and couldn’t find anything else.

There’s nothing above me, nothing below. So I’ll let you know I looked for a conversion method in a bunch of places, stopped at Stringy, and couldn’t go any farther. Sorry.

Raku

You’re probably wondering how to get out of this quandary. Reading the Blob documentation closely, you might think that the encode method is the way out of our present jam. If you look closer, though, there’s a spanner in the works. “\xD0” is the byte 0xd0, so if you try to decode to ASCII, you run into the problem that ASCII only covers 0x00-07xf, everything outside of that is undefined.

Packing for vacation

If you’ve kept up with things, you might surmise by now that the key to our quandary lies in the pack and unpack builtins. Specifically unpack(), because we’re trying to “decode” a buffer into something suitable for Raku.

Unless you’ve done things like network programming or security, the pack and unpack builtins are going to be unfamiliar territory. The closest analogue of pack() is the builtin sprintf().

Both of these builtins take a format string telling the compiler how to arrange its arguments. Both of them take a mixture of string and integer arguments afterwards. But while sprintf() takes the arguments and treats its output as a UTF-8 encoded string, pack() takes the same arguments and treats its output as a raw buffer of bytes.

And now you can see one way out of our little predicament. If we could just find the right invocation, pack() would be able to take our string “\xd0\xcf…” and turn it into a Buf object. Then we could compare the buffer we got by reading 8 bytes to the buffer we expected.

So instead of cluttering up the main code, let’s write a quick test.

use experimental :pack;
constant HEADER-ID = "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1";

use Test;
my $fh = open "test.xls";
my Buf $buf = $fh.read( 8 );

is $buf, pack( "A8", HEADER-ID ); # Pack 8 ASCII characters

Testing…testing…

Let’s take it from the top. We tell Raku to use the “experimental” pack() builtin, and declare the header we want to check against. Then we tell Raku we want to use the Test module, and open a new Microsoft Excel test file.

Last, we read a chunk of 8 bytes from the file into a buffer, and check to see that the 8 bytes matches the header we expect to see. Now, how did we get that weird ‘A8’ string in there? I thought pack() looked more like sprintf()?

Well, it does, to an extent. I/O routines like sscanf() and sprintf() can do all sorts of things to your strings and numbers on the way in and out, think for example what ‘%-2.10f’ means in a format specifier, for instance. You can follow along with the unpack() documentation if you like.

pack(), by contrast, just takes 8, 16, or 32-bit chunks of your input, and places them into a buffer. The “A” in “A8” says that it wants to convert an ASCII-sized chunk of your input (“\xd0” in our case) into a byte in the buffer, so our Buf now looks like ( 0xd0 ).

I could just as well have said “AAAAAAAA” in order to translate all 8 characters of the buffer, but I think it’s a little tidier to use the ‘repeat’ option, and say “A8” in order to convert just 8 characters (yes, yes, I know, they’re glyphs, but let’s not confuse matters.)

I could write “A*” just as well, but “A8” makes sure that 8 and only 8 (the number that thou shalt count to…) characters get converted. I doubt that the header in an OLE file will change, but it’s a nice bit of forward planning.


For those of you that made it this far, thank you. As usual, gentle Reader, if you have any comments, criticisms (constructive, please) or questions, feel free to post them below.

Next week I’ll delve deeper into the mysteries of pack(), unpack() and some of the tips and tricks I use to keep on my toes and make sure that I generate clean Microsoft-compatible output.

Rewriting Perl Code for Raku II: Electric Boogaloo

Picking up from Part One, we’d just finished up rewriting a Perl script into the test suite for the Raku translation of OLE::Storage_Lite. Raku programming is made easier by having lots of tools, but Microsoft documents aren’t yet well-represented in the Raku ecosystem.

Being able to read/write OLE allows us to create a whole range of Microsoft documents (at least where they’re documented.) Because of its day-to-day use, we’re focusing on Excel here. Many businesses still rely on Excel for their day-to-day task management, time tracking and home-grown processes.

I’ve been known to wax philosophical about this after a few Westmalle Tripels at various conferences. Now is the time for doing something about it. Here’s what our burgeoning test suite looked like, at least in part. The current code is in raku-OLE-Storage_Lite over on github.com. I’ve gotten rid of most of the Perl 5 test skeleton, but the essence remains.

use v6;
use Test;
use OLE::Storage_Lite;

plan 1;

my $oDt = OLE::Storage_Lite::PPS::Root.new(
  (),
  ( 0, 0, 16, 4, 10, 100 ), # 2000/11/4 16:00:00:0000
  ( $oWk, $oDir )
);
subtest 'Root', {
  isa-ok $oDt, 'OLE::Storage_Lite::PPS::Root';
  is $oDt.Name, 'Root Entry';
  is-deeply $oDt.Time2nd, [ 0, 0, 16, 4, 10, 100 ];
  # ...
};
done-testing;

Originally there really weren’t any Perl 5 tests for this module. I’m sure the original author treated the entire module as a black box, and they were happy to be able to run samples/smpsv.pl, open the new test.xls in Excel, and when it actually read the file, treat that as ‘ok 1’, push it to CPAN and call it a day.

Testing, testing

That’s wonderful, and I may eventually adopt that methodology. For the moment, the lack of a test suite leaves me a bit unsatisfied. I suppose I could treat the entire module as a black box and fix the translated version line-by-line as I go through it. I’ll have to do that eventually (spoiler alert: That’s actually where I am – I’m writing these pieces a bit after the fact.)

That leaves me with the question of what to test, and what the quickest way to get there is. The individual Directory, Root and File objects are exposed to the user, and are part of the public API. So it makes some sense to create an object, look at the internals, and do my best to match that in Raku.

I Think I’m A Clone Now

There’s always two [implementations] of me standing around… I don’t want to get sidetracked by reading the entire OLE spec. I might start to realize what a huge job this really is, and abandon ship. So, I’m going to limit myself to the following:

Create a narrowly defined 1:1 clone of the exact source of OLE::Storage_Lite in Perl 5. The objects will act exactly like the Perl 5 version, as will the API. This way I don’t have to think about what the API should do, how it should look in Raku, how the objects get laid out, anything fancy. All I need to worry about is:

  1. When I write warn $oDt.raku, does the output look the same as use YAML; warn Dump($oDt); in Perl 5?
  2. When I write the final file to disk, does the Raku code output exactly the same file as the original Perl 5 version?

That’s it. It takes away a lot of possibilities, but it lets me focus on getting the job done, not how things should look. Being able to test how the individual objects look will tell me that the read API works and saves enough data to be able to reconstruct the object in memory.

Conversely, being able to match the binary output tells me that the write API works, so I’ve effectively tested as much as the original module did. Plus I can automate some of the process, especially on the read side.

Lost in Translation

You can check out the current source at raku-OLE-Storage_Lite, and follow along with some of the changes I’ve made. I also made sure to keep a working copy of the original OLE::Storage_Lite Perl 5 module around. My Raku tree right now is very close to Perl 5.

I can insert a debug statement like die "[$iBlockNo] [$sData]\n" in the Perl 5 code, go to the equivalent line in Raku, and expect that when I run the two test suites, that they’ll die in exactly the same way.

This way when they don’t, I can immediately narrow down the problem simply by moving the ‘die’ statements up in the code until they return the same values. The line immediately below the ‘die’ statement will be the culprit.

The Nitty Gritty Perl Band

I’ll mention one thing in passing – the original Perl 5 source code is in a single file containing all of the packages. That’s not Raku style, so I’ve unpacked it into lib/OLE/Storage_Lite/* following the usual style of one Perl 5 class – one file.

So, time to get our hands dirty. The new Raku module won’t compile for quite a while, so we’d better put this into git. I’m also using App::Mi6 to do my development and eventual push to CPAN, so all of that boilerplate is there too.

So, cue the montage scene of the dedicated Raku hacker pounding away at the keyboard, with the occasional break for food and/or adult beverage. Looking over her shoulder, we see a familiar split-screen view, with Perl 5 code on top, and a new Raku file below.

use OLE::Storage_Lite::PPS;
package OLE::Storage_Lite::PPS::Root;
use vars qw($VERSION @ISA);
@ISA = qw(OLE::Storage_Lite::PPS);
use OLE::Storage_Lite::PPS;
unit class OLE::Storage_Lite::PPS::Root is OLE::Storage_Lite::PPS;

Raku has classes where Perl 5 has packages. The ‘unit’ declaration there says that the class declaration takes up the remainder of the file. This is sort of how Perl 5 does it, but gets rid of the ‘1;’ at the end of your package declaration.

It’s also useful for another reason I’m not going to show. Namely that the Perl 5 code is directly below the Raku code, commented out. I’m flipping between vim windows to delete lines as I translate them by hand. So the ‘unit class’ declaration helps in case I accidentally un-comment Perl 5 code – I’ll get big honkin’ warnings when I run the test suite.

Moosey-ears!

(for those of you that remember the module’s release)

Raku borrowed liberally from Perl 5’s Moose OO metamodel, to the point where using Raku will feel very similar. Just drop a few bits of syntactic sugar that Moose needed to work under Perl, and it’ll feel the same.

In this case the ‘is’ does the same job as in Moose, to introduce a parent class. Raku doesn’t need the sugar that Moose sweetens your code with, so you can just say your class ‘is’ a subclass of any other class.

Let’s keep rolling along here, with the next lines of the Perl 5 library:

require Exporter;
use strict;
use IO::File;
use IO::Handle;
use Fcntl;
use vars qw($VERSION @ISA);
@ISA = qw(OLE::Storage_Lite::PPS Exporter);
$VERSION = '0.19';
sub _savePpsSetPnt($$$);
sub _savePpsSetPnt2($$$);
use OLE::Storage_Lite::PPS;
unit class OLE::Storage_Lite::PPS::Root:ver<0.19> is OLE::Storage_Lite::PPS;

Moving along… Okay, ya caught me, ‘:ver<0.19>’ is something new that we should add. Versions are now integrated into classes, so you can check them and even instantiate based on version number.

The module actually doesn’t export anything, so we don’t need Exporter at all. Raku enables ‘strict’ automatically, has IO modules in core, and doesn”t need Fcntl. The forward declarations aren’t needed for Raku, so all that’s left is the module’s version number, which gets added to the class name. You can add other attributes, too.

Making things functional

To keep things simple for me writing the code, and me having to read the code weeks, months or years later, I want as close to a 1:1 relation between Perl 5 and Raku as I can. Another place where this requires an accommodation (but not much of one) is just a few lines down, writing the creation method ‘new’.

sub new ($;$$$) {
    my($sClass, $raTime1st, $raTime2nd, $raChild) = @_;
    OLE::Storage_Lite::PPS::_new(
        $sClass,
        undef,
        # ...
    );
}

By this point you’ll probably see more of why I say this module is a hard worker. It’s been around a long time, and function prototypes like this are one easy way to tell. Let’s rewrite it in a more modern Perl 5 style before making the jump to Raku, with function signatures.

sub new($sClass, $raTime1st, $raTime2nd, $raChild) {
    OLE::Storage_Lite::PPS::_new(
        $sClass,
        undef,
        # ...
    )
}

Just drop the old function prototype, and replace it with the variables we need to populate. Well, almost. If you know what a subroutine prototype is, you might think I’m pulling a fast one on you. And you’d be right. Look back at the original Perl 5 code, and you’ll see ‘($;$$$)’ is the prototype.

The ‘;’ separates required variables from optional variables, and we haven’t accounted for that in our Perl 5 code. Since I’m not here to modernize Perl 5 code but convert it to Raku, I’m going to ignore that in Perl 5 and go straight to Raku.

multi method new( @aTime1st?, @aTime2nd?, @aChild? ) {
  self.bless(
    Time1st => @aTime1st,
    Time2nd => @aTime2nd,
    Child => @aChild
  );
}

Under Construction

And there we are. Now, there’s quite a bit to take in, so I’ll take things slow. The first thing you’ll notice is the keyword ‘multi’. In Perl 5, you get to hand-roll your own constructors, so you can make them any way you like. In this case, the author chose to write new($raTime1st, $raTime2nd, $raChild), which is pretty common.

Raku gives me a default ‘new’ method, so I only need to hand-roll constructors when I want. Since I want to keep as close as reasonable to the original API, I’ll write a constructor that takes 3 arguments too. In my case I chose to simplify things just a bit here.

I’ve found over several years of writing Raku code that I rarely use references. In Perl 5 they were pretty much the only way to pass arrays or hashes into a function, because of its propensity to “flatten” arguments.

In Raku, you can still use the Perl 5 style, but formal argument lists are the way to go in my opinion. If you need to pass both an array and a hash to a Raku function, go for it. I encourage that in my tutorial courses, and recommend it to help break students out of their Perl 5 mindset.

This is not to say that there’s anything wrong with Perl 5’s argument list, in fact they’ve taken some ideas from Raku for formal argument lists, and I encourage that. Cross-pollination of ideas should be encouraged, it’s how both languages grow and add new features.

Last week was about the overall module, this week we delved a bit into the OO workings. Next week we’ll talk about references, attributes, and maybe progressive typing.