Welcome to the first in what I hope will be an ongoing series of blog posts on research in functional programming. I see a lot of really neat papers come and go that show the great things that are happening in this design space, and I’m hoping to open their discoveries to a slightly wider audience by providing more what I hope are readable descriptions of them for people who don’t have the time to trawl through 20 pages or so. By using examples liberally I hope I can provide an alternative to the sometimes rather dry language of the abstracts. Anyway, that explained we can get on with the show!

A really cute paper just came up on Lambda the Ultimate, about the implementation of a new feature in the .NET functional programming language F# called Active Patterns. The basic idea is to let you pattern match not just structurally, like this:

> data Animal = Armadillo String
| Mink String
| FunctionalProgrammer String Integer
>
> show (Armadillo name) = name
> show (Mink name) = name
> show (FunctionalProgrammer name iq) = name ++ “, IQ of ” ++ (show iq)

But also FUNCTIONALLY, by letting you define your own match functions:

> (|Text|Element|) xmlNode = if xmlNode.ContentType = Text then Text (xmlNode.Text) else Element (xmlNode)
>
> match myXMLNodeInstance of
> Text s -> print (“Looks like a text node containing ” ++ (show s))
> Element e -> print (“Hmm, it’s something else!”)

And actually it goes beyond this because your pattern matching functions can receive arguments before the data you actually want to pattern match on, or can be partial functions (i.e. fail to pattern match on some inputs). Their “curried” nature lets the authors of the paper define a function which lets them write a pattern matching on the Nth bit of an integer, and on that integer rotated N times, where N is potentially supplied at runtime. This clearly has great potential for expressiveness!

I think that’s quite enough about active patterns, but I must say that I’m quite tempted to go and add F# to my repetoire now :-). Coming up next time, possibly: data parallel Haskell (concurrency for free!) or constructor tagging (cute optimisation for lazy functional languages).

So, last year while working at Resolver Systems I worked with the author of Movable Python, which is a fairly neat application that lets you carry Python around on a USB stick. As an intellectual exercise I reimplemented the core functions in C# with a slightly less eccentric interface (no offence meant, Michael!), but was suprised to find I had to roll my own code to setup global hotkeys to run scripts, which was a nice feature of Movable I wanted to try and add.

The Win32 APIs for this are themselves are pretty grungy (all sorts of nastiness with bitflags and message loops), but I think I’ve been fairly successful in abstracting that away in this simple class, which you can get from here. With this, you can setup a hotkey like this:

Hotkey hk = new Hotkey();
hk.KeyCode = Keys.1;
hk.Windows = true;
hk.Pressed += delegate { Console.WriteLine(“Windows+1 pressed!”); };

if (!hk.GetCanRegister(myForm))
{ Console.WriteLine(“Whoops, looks like attempts to register will fail or throw an exception, show an error/visual user feedback”); }
else
{ hk.Register(myForm); }

// .. later, at some point
if (hk.Registered)
{ hk.Unregister(); }

Obviously this is a bit of a kitchen sink example to try and show every feature, but even then it’s a damn sight easier than what you would use to set up even a basic usage manually with P/Invoke. If you are wondering about the requirement for a form parameter to Register and GetCanRegister, this is simply a requirement of the underlying APIs, probably due to the fact that they signal hotkey presses in the application message loop rather than via a callback.

I hope someone finds this useful and to facilitate that process I hereby place all the code into the public domain to be messed around with to your hearts content. Enjoy!

That’s the question I found myself asking earlier this month when I was writing a simple compiler for an OCaml dialect called MinCaml. I don’t know if you’ve ever taken a look at the Intel IA32 instruction references, but there are two volumes of detailed descriptions of all the functions one of these CPU provides: about 600 in total!

As this compiler backend was my first real go at writing out assembly (though I’d read quite a lot of it before) I found the task of potentially learning all these instructions extremely intimidating, and wished I had an assembly language “cheat sheet” where I could go to find out the most common (and hence probably the most useful) instructions. So, today I got around to writing a little program to use UDis86 to scan all the binaries in C:\Windows\System32 and my Cygwin /bin directory and spit out some instruction counts. Now, before we see the results let me first say that my sample has a bias in that it’s all precompiled stuff that will be designed to take advantage of only the lowest common denominator of the target CPU instruction sets. This means that the distribution below will probably be quite different than if you scan, say, your machine-specific Gentoo installation. Right, onto the big monstrous Excel pie chart:

IA32 Instruction Incidence

I’ve also uploaded my raw data to my personal site, along with the program I used to get the results for anyone who is interested.

In the finest traditions of top 10 lists everywhere, let’s give a quick rundown of the most popular instructions:

  1. add: is anyone really suprised by this?
  2. mov: again, it’s fairly hard to get work done without moving data around
  3. push/pop: these instructions are commonly used for setting up/destroying function stack frames, hence their popularity despite IA32s register orientation
  4. invalid: do’h! These are from areas of code where udis86 just barfs :-). Their high ranking could mean a lot of instructions are in use that it can’t deal with, or (more likely) it is trying to disassemble executable regions of the binary that just don’t have anything useful in them
  5. cmp: the absolute bedrock of branching on the IA32, but what is suprising is that it’s so much higher than jz. Are there really that many pure numeric comparisions in programs?
  6. dec/inc: I’m fairly suprised that dec beats inc here, especially since sub is so much lower than add in my data: does anyone have a theory?
  7. xor/or: xor no doubt clocks in so high because of the “xor reg, reg” idiom used to zero out registers, but or is quite popular too for something which is really only good for bit flags and the odd if expression…

Well that was fun, in a geeky sort of way! I’d be quite interested in knowing the reasons behind some of the oddities above: maybe another blog entry is in order…?

Edit: an experienced reverse engineer, t_w, commented on Reddit that my count for add was quite high, and that it might be caused by counting junk code. It turns out that the byte sequence 0x0000 disassembles to an add instruction, which skews my results since the null byte is common padding for executable .text sections. However, after correcting for this (the charts above have taken this into account) it still leaves add in first place (it had 83355055 counted occurances before excluding this sequence, but still 45303805 after exclusion).

Ugly field selection syntax
OK, the most trivial complaint first. If we have defined a record like this:

> data Bird = Bird { name :: String, wings :: Integer }

How do we go about accessing the ‘name’ and ‘wings’ fields of a record instance? If you are used to a language like C, you might say it would look something like this:

> (Bird { name = “Fred”, wings = 2 }).name

Unfortunately, this isn’t the case. Actually, declaring a record creates a named function which uses pattern matching to destroy a passed record and return the name. So access actually looks like this:

> name (Bird { name = “Fred”, wings = 2 })

Prefix notation may please the Lisp fans, but for me at least it can get a bit confusing, especially when you are dealing with nested fields. In this case, the code you write looks like:

> (innerRecordField . outerRecordField) record

Which (when read left to right, as is natural) is entirely the wrong order of accessors. However, it is possible to argue this is just a bug in my brain from having spent too long staring at C code.. anyway, let’s move onto more substantitive complaints!

Namespace pollution
Imagine you’re writing a Haskell program to model poulty farmers who work as programmers in their spare time, so naturally you want to add to the Bird record above a Person record:

> data Person = Person { name :: String, knowsHaskell :: Boolean }

But I think you’ll find the compiler has something to say about that….

Records.hs:4:23:
Multiple declarations of `Main.name’
Declared at: Records.hs:3:19
Records.hs:4:23

Ouch! This is because of the automatic creation of the ‘name’ function I alluded to earlier. Let’s see what the Haskell compilers desugaring would look like:

> newtype Bird = Bird String Integer

> name :: Bird -> String
> name (Bird value, _) = value

> wings :: Bird -> Integer
> wings (Bird _, value) = value

> newtype Person = Person String Boolean

> name :: Person -> String
> name (Person value, _) = value

> knowsHaskell :: Person -> Boolean
> knowsHaskell (Person _, value) = value

As you can see, we have two name functions in the same scope: that’s no good! In particular, this means you can’t have records which share field names. However, using the magic of type classes we can hack up something approaching a solution. Let’s desugar the records as before, but instead of those name functions add this lot:

> class NameField a where
> name :: a -> String

> instance NameField Bird where
> name (Bird value, _) = value

> instance NameField Person where
> name (Person value, _) = value

All we have done here is used the happy (and not entirely accidental) fact that the ‘name’ field is of type String in both records to create a type class with instances to let us extract it from both record types. A use of this would look something like:

> showName :: (NameField a) => a -> IO String
> showName hasNameField = print (“Name: ” ++ (name hasNameField))

> showName (Person { name = “Simon Peyton-Jones”, knowsHaskell = true })
> showName (Bird { name = “Clucker”, wings = 2 })

Great stuff! Actually, we could use this hack to establish something like a subtype relationship on records, since any record with at least the fields of another could implement all of its field type classes (like the NameField type class, in this example). Another way this could be extended is to make use of the multiparameter type classes and functional dependency extensions to GHC to let the field types differ.

Of course, this is all just one hack on top of another. Actually, considerable brainpower has been expended on improving the Haskell record system, such as in a 2003 paper by the areforementioned Simon Peyton-Jones here. This proposal would have let you write something like this:

> showName :: (r <: { name :: String }) -> IO String
> showName { name = myName, .. } = print (“Name: ” ++ myName)

The “r <: { name : String }” indicates any record which contains at least a field called name with type String can be consumed. The two dots “..” in the pattern match likewise indicate that fields other than name may be present. Note also the use of an anonymous record type: no data decleration was required in the code above. This is obviously a lot more concise than having to create the type classes yourselves, as we did, but actually we can make it even more concise by using another of the proposed extensions:

> showName { name, .. } = print (“Name: ” ++ name)

Here, we omit the “name = myName” pattern match and make use of so-called “punning” to give us access to the name field: very nice! Unfortunately, all of this record-y goodness is speculative at least until Haskell’ gets off the ground.

Record update is not first class

Haskell gives us a conventient syntax for record update. Lets say that one of our chickens strayed too close to the local nuclear reactor and sprouted an extra limb:

> exampleBird = Bird { name = “Son Of Clucker”, wings = 2 }
> exampleBird { wings = 3 }

The last line above will return a Bird identical in all respects except that the wings will have been changed to 3. The naïve amongst us at this point might then think we could write something like:

> changeWings :: Integer -> Bird -> Bird
> changeWings x = { wings = x }

The intention here is to return a function that just sets a Bird records “wings” field to x. Unfortunately, this is not even remotely legal, which does make some sense since if it was record update should, to follow normal function application convention, look more like this:

> { wings = 3 } exampleBird

Right, I think that’s got everything that’s wrong about Haskell records off my chest: do you know of any points I’ve missed?

Edit: Corrected my pattern match syntax (whoops :-). Thanks, Saizan!
Edit 2: Clarified some points in response to jaybee’s comments on the Reddit comments page.

Inspired by Yariv’s Blog, where he talks about a framework for building web applications in Erlang, and my so far abortive attempts to get into Erlang, I decided to give it another go with Erlyweb. Erlyweb depends on YAWS (Yet Another Web Server), however, and this proved to be a bit of a pain to install since I’m being difficult and using Windows on my development machine. So, in order to help any other lost souls who try to duplicate this feat in the future, I’m recording the process (tested against YAWS 1.68):

  1. Install Erlang (obviously), and make sure it is in your PATH
  2. Install Cygwin with the Perl and GNU Make packages at minimum
  3. Unpack the latest YAWS release into your home directory
  4. Now, the first trickiness: there is a small error in the YAWS makefile, so open up the yaws-X.XX\src\Makefile and for the mime_types.erl target change the first command to be not $(ERL) but “$(ERL)”. The quotes mean that for those of us with Erlang installed in a path with spaces in the name (such as Windows users who put it in Program Files) the erl executable will actually be found. If you don’t follow this step you’ll end up with some error like:

    /cygdrive/c/Program Files/Erlang/5.5.4/bin/erl -noshell -pa ../ebin -s mime_type_c compile
    make[1]: /cygdrive/c/Program: Command not found

  5. Follow the same process to add quotes around $(ERLC) in www\shopingcart\Makefile and www\code\Makefile (somewhat weirdly, every other uses of $(ERL) and $(ERLC) have been quoted for us, suggesting this is just something they overlooked, rather than that running on Windows is a blasphemy)
  6. Whack open a Bash Cygwin shell and cd into the yaws-X.XX directory
  7. Do the usual “./configure; make” dance
  8. Open up the newly created yaws file in the bin subdirectory and change the last line so that $erl is in quotes, i.e. from this:

    ${RUN_ERL} “exec $erl $XEC”

    To this:

    ${RUN_ERL} ‘exec “$erl” $XEC’

  9. From this point on I’m going to assume you need to do a local install: if you want to do your own thing, you can follow the instructions here, but you may need to adapt them based on what I’m going to talk about below. Anyway, run “make local_install” do to the install if you are following along at home
  10. Now, this is where it can get a bit confusing: although we just built YAWS under Cygwin, since we have a Windows binary of Erlang the paths in yaws.conf (which should have appeared in your home directory) must be Windows paths, but the makefile used Unix ones. Go in and fix all of those (for me, this meant putting “C:/Cygwin” in front of all of them)
  11. Point your web browser at localhost HTTPS on port 4443 to see what you have wrought
  12. Sit back and take a deep breath to cleanse your soul of the accumulated Unix gunk

Here’s one I made earlier