Rewriting Perl Code for Raku IV: A New Hope

Back in Part III of our series on Raku programming, we talked about some of the basics of OO programming. This time we’ll talk about another aspect of OO programming. Perl objects can be made from any kind of reference, although the most common is a hash. I think Raku objects can do the same, but in this article we’ll just talk about hash-style Perl objects.

Raku objects let you superclass and subclass them, instantiate them, run methods on them, and store data in them. In previous articles we’ve talked about all but storing data. It’s time to remedy that, and talk about attributes.

Instance attributes

We used unit class OLE::Storage_Lite; to declare our class, and method save( $x, $y ) { ... } to create methods. Or in our case rewrite existing functions into methods. Now, we focus our attention on some of the variables that should really be instance attributes, and why.

Let’s get to know which variables behave like attributes, and which don’t. This will change how we write our Raku code, but hopefully for the better. We’ll start from the outside in, and look at the API. There are a few “test” scripts that use the module, and this fragment is pretty common.

use OLE::Storage_Lite;
my $oOl = OLE::Storage_Lite->new('test.xls');
my $oPps = $oOl->getPpsTree(1);
die( "test.xls must be a OLE file") unless($oPps);

The author creates an object ($oOl) from an existing file, then fetches a tree of “Pps” objects, whatever they are. So, one OLE::Storage_Lite object equals one file. This gives me my first instance variable, the filename.

sub new($$) {
  my($sClass, $sFile) = @_;
  my $oThis = {
    _FILE => $sFile,
  };
  bless $oThis;
  return $oThis;
}

Above is how they wrote it in Perl, and below is how we’d write it (exactly as specified) in Raku:

has $._FILE;

multi method new( $sFile ) {
  self.new( _FILE => $sFile );
}

Later on, we can call my $file = OLE:Storage_Lite.new( 'test.xls' ); just like we did in Perl. We wouldn’t even need the new method if we had users call my $file = OLE::Storage_Lite.new( _FILE => 'text.xls' );. This gives users the option of calling the API in the old Perl fashion or the new Raku fashion without additional work on our part.

Strict Raku-style

There’s a problem lurking here, though. The constructor Raku provides us lets us call my $file = OLE::Storage_Lite.new(); without specifying a value for $._FILE. If you know Perl’s Moose module, though, the ‘has’ there just might look familiar.

And for good reason. A lot of the ideas from Moose migrated into Raku during its design, and the attributes were one of those. Moose lets you do a lot of things with attributes, and so does Raku. One of those is you can add “adverbs” to them. Let’s do that now.

has $._FILE is required;

Calling OLE::Storage_Lite.new() now fails, because you’re not passing in the _FILE argument. That solves one problem. Actually, it solves two, come to think of it. In the original Perl code, you could call OLE:Storage_Lite->new() too, and it wouldn’t complain. Now we’ve fixed that, with one new term.

Progressive Typing

No, we’re not talking about some new editor like Comma (the link does work, despite the certificate problem.) Our code would run just fine, as-is. Users could call our .new() API, Raku would make sure the filename existed, and we could go on with translating.

But there’s something more we can take advantage of here, and that is the fact that any Raku object (and anything we can instantiate is an object) is a type as well. We haven’t mentioned that because we really couldn’t use that information until now.

The original Perl code is littered with clues to types, hidden in the variable names. When we wrote our own API call, the Perl code called the file name $sNm. The ‘s’ tells the Perl compiler nothing, but it tells us that $sNm is a String type. Perl may not have true types, but Raku does. Let’s fix our attribute with that in mind.

has Str $._FILE is required;

We knew all along that $._FILE is a string of some sort, but telling Raku that lets it allocate space more efficiently. Making sure it’s a required attribute lets anyone that calls new() know if they forget an argument. We could go a little farther with this, but locking down attributes will help in the long run, when we start dealing with the pack and unpack built-ins.

Packing It All In

We’re now getting to the heart of the module. There’s a lot of mechanics above us, allocating objects and doing math and checking types, and not much below us. The class’ entire purpose is to read and write OLE-formatted files. We’ll talk more about the boilerplate, but here’s the real meat of the file.

Let’s start with what should be simple, reading in data. Just like in Perl, we open a file and get back a “file handle” (assuming the file exists, of course.) By default, calling my $fh = open $._FILE; gives us a read-only file handle. The file handle itself has a bunch of attributes associated with it, but the important one right now is its encoding.

Namely, the fact that it has none. An OLE file is essentially a miniature filesystem (probably based on FAT) packed onto disk, complete with a root directory, subdirectories and files. File have names encoded in UCS-2, but the rest is entirely dependent upon what the application requires.

The upshot of which is that we can’t read the format with something simple like my @lines = $fh.lines; which would read line after line into the @lines array. Instead we’ll use calls like read() and write() that return byte-oriented buffers.

Buffering…

All OLE files start off with the header “\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1”, so we should probably start there. That’s important twice in the code, in fact. First, when we’re reading off disk, we can check it against what we’ve just read to make sure this file is OLE, and not, say, a JSON file. Later on, when we’re saving out an OLE file, we can write it as the header string.

constant HEADER-ID = "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1";

I’ll make it a constant as well, so when I revisit this code in a month I don’t have to go looking in specs for ‘0xd0 0xcf’ to remember what this is. Reading is straight-forward too. It needs just a byte count.

my Buf $header = $fh.read( 8 );

Something important to notice here is the type, ‘Buf’. If our file was in Markdown, or JSON we could get away with just writing my @lines = $fh.lines; like I tried earlier. But these are raw bytes, hindered by no interpretation. Let’s see what happens when we compare these bytes to our HEADER-ID.

t/01-internals.t ............ Cannot use a Buf as a string, but you called the Stringy method on it
  in method _getHeaderInfo at /home/jgoff/GitHub/drforr/raku-OLE-Storage_Lite/lib/OLE/Storage_Lite.pm6 (OLE::Storage_Lite) line 169
  in block <unit> at t/01-internals.t line 42

Another brick in the wall

Ka-blam. But… hold the phone here a minute, I just said $header eq HEADER-ID, I didn’t write anything like ‘Stringy’! There’s no ‘Stringy’ in the source… oh. HEADER-ID is a string, so Raku is being helpful. I’m trying to use string comparison (‘eq’) between something that’s not a Str ( $header ) and something that is (HEADER-ID).

Pull up the Stringy documentation, and look for the Type graph. Midway down you’ll see ‘Buf’ and ‘Str’, as of this writing Buf is on the left, and Str is popular so it’s in the middle.

Trace the inheritance paths from Buf and Str upwards, and you’ll see they pass Buf -> Blob -> Stringy and Str -> Stringy, and stop. What the error message therefore is saying is this, anthropomorphized:

You wanted to convert Buf to Str, and didn’t care how you did it. So I looked. First, on the Buf type. No .Str method there, at least without arguments. No good. So I looked in its parent, Blob. Nothing doing there. Then I looked at Stringy, and couldn’t find anything else.

There’s nothing above me, nothing below. So I’ll let you know I looked for a conversion method in a bunch of places, stopped at Stringy, and couldn’t go any farther. Sorry.

Raku

You’re probably wondering how to get out of this quandary. Reading the Blob documentation closely, you might think that the encode method is the way out of our present jam. If you look closer, though, there’s a spanner in the works. “\xD0” is the byte 0xd0, so if you try to decode to ASCII, you run into the problem that ASCII only covers 0x00-07xf, everything outside of that is undefined.

Packing for vacation

If you’ve kept up with things, you might surmise by now that the key to our quandary lies in the pack and unpack builtins. Specifically unpack(), because we’re trying to “decode” a buffer into something suitable for Raku.

Unless you’ve done things like network programming or security, the pack and unpack builtins are going to be unfamiliar territory. The closest analogue of pack() is the builtin sprintf().

Both of these builtins take a format string telling the compiler how to arrange its arguments. Both of them take a mixture of string and integer arguments afterwards. But while sprintf() takes the arguments and treats its output as a UTF-8 encoded string, pack() takes the same arguments and treats its output as a raw buffer of bytes.

And now you can see one way out of our little predicament. If we could just find the right invocation, pack() would be able to take our string “\xd0\xcf…” and turn it into a Buf object. Then we could compare the buffer we got by reading 8 bytes to the buffer we expected.

So instead of cluttering up the main code, let’s write a quick test.

use experimental :pack;
constant HEADER-ID = "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1";

use Test;
my $fh = open "test.xls";
my Buf $buf = $fh.read( 8 );

is $buf, pack( "A8", HEADER-ID ); # Pack 8 ASCII characters

Testing…testing…

Let’s take it from the top. We tell Raku to use the “experimental” pack() builtin, and declare the header we want to check against. Then we tell Raku we want to use the Test module, and open a new Microsoft Excel test file.

Last, we read a chunk of 8 bytes from the file into a buffer, and check to see that the 8 bytes matches the header we expect to see. Now, how did we get that weird ‘A8’ string in there? I thought pack() looked more like sprintf()?

Well, it does, to an extent. I/O routines like sscanf() and sprintf() can do all sorts of things to your strings and numbers on the way in and out, think for example what ‘%-2.10f’ means in a format specifier, for instance. You can follow along with the unpack() documentation if you like.

pack(), by contrast, just takes 8, 16, or 32-bit chunks of your input, and places them into a buffer. The “A” in “A8” says that it wants to convert an ASCII-sized chunk of your input (“\xd0” in our case) into a byte in the buffer, so our Buf now looks like ( 0xd0 ).

I could just as well have said “AAAAAAAA” in order to translate all 8 characters of the buffer, but I think it’s a little tidier to use the ‘repeat’ option, and say “A8” in order to convert just 8 characters (yes, yes, I know, they’re glyphs, but let’s not confuse matters.)

I could write “A*” just as well, but “A8” makes sure that 8 and only 8 (the number that thou shalt count to…) characters get converted. I doubt that the header in an OLE file will change, but it’s a nice bit of forward planning.


For those of you that made it this far, thank you. As usual, gentle Reader, if you have any comments, criticisms (constructive, please) or questions, feel free to post them below.

Next week I’ll delve deeper into the mysteries of pack(), unpack() and some of the tips and tricks I use to keep on my toes and make sure that I generate clean Microsoft-compatible output.

0 thoughts on “Rewriting Perl Code for Raku IV: A New Hope

  • ab5tract says:

    Don’t forget that you can also wire in a quick existence check in the `has` declaration:

    has Str $._FILE is required where *.IO.e;

    I can totally understand not adding that part into the tutorial here, as it adds unneeded complexity to non-fluent readers.

    All the more reason, I thought, to add it to the comments. Thanks for this series, and the porting of OLE::Storage_Light (though I am hopeing that you will also one day address that unnecessary under-bar in the module name … 😉 ).

    • ab5tract says:

      Just noticed that this results in a Less Than Awesome error message, though.


      > class F { has $.f is required where *.IO.e }
      > F.new: f => "/tmp/not-there"
      Type check failed in assignment to $!f; expected but got Str ("/tmp/not-there")
      in block at line 1

      In the spirit of the season, I’ll look into fixing that LTA issue 🙂

      • Maybe my subset Filename of Str where *.IO.e would do the trick, haven’t checked as I’m off to lunch. But again, thanks for the comments and as always, it’s good to know that people are actually reading these articles.

      • alabamenhu says:

        You can add a bit more to the where clause to get a >LTA error message: has Str() $._FILE is required where {$^file.IO.e or die "The passed file ‘$file’ must exist."};
        If you don’t pass a string (or stringifiable — hence the Str() type), you’ll get a Str needed error message, if you do pass a Str then it has to also exist or the custom die message will be passed.

        • That’s a good solution too, though I’m not quite sure I’d use it in a module. I’m happy with the has Str $._FILE; that’s there right now, because it seems to me to create the right balance between module and script. I think I’d want the “or die…” bit to occur in the script where I can let the user customize the error message that gets returned. It’s less error checking for me, and more flexibility for the end user.
          Plus once people start to see how incredibly flexible sub MAIN(...) { ... } is in Raku – you can make it do everything GetOpt::* does in Perl – everyone will have scripts that have those sort of ‘where {…}’ type checks at the top of their script, and module authors won’t have to test as many assumptions.
          Also, I might at some point want to use other types of arguments to read an OLE format file from, and changing the type from Str to Str|IO::Handle or whatever won’t be as onerous with really strong type checking “in the way.”

    • Oh, I hadn’t forgotten. I had that very line staring at me in my editor, in fact, until I checked the word count and realized that while it was oh, so tempting, it was also beyond the scope of the article and would be something of a distraction. One of the later articles in the series will cover strong typing in more depth, in particular I’ve got so many variables floating around that should be restricted by size as well as by type, that this module is in fact a good candidate for a block of subsets like my subset UInt8 of Int where 0 <= * <= 255;.
      As to the underbar, what I'm going to do probably is release this to the ecosystem as-is, and apply lessons learned to the full OLE::Storage module, assuming it's not got too many new nodes. Thanks again, it's wonderful to see that people actually read these articles and find them useful.

  • Brad Gilbert says:

    I would have just made the HEADER-ID constant a Blob. Then eq would have just worked.

    constant HEADER-ID = Blob.new: 0xD0,0xCF,0x11,0xE0,0xA1,0xB1,0x1A,0xE1;

    I can’t see a reason you would ever need it to be in Str form.

    • Good point. I’m fixing DateTime stuff at the moment, but when the time comes to revisit the actual reading/writing of buffers I’ll probably do that. The only thing I need to stringify ATM is the internalized file name, everything else should probably remain as a Blob.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>