Cute Haskell Code

May 9, 2007

For some recent supervision work on my Security lectures, I was given the task of decoding a string encrypted with a simple shift cipher. This cipher, given a key, simply moves each letter on in the alphabet by an amount given by the key, wrapping at the end. Being a good functional programmer, I decided to implement a brute force solver for this in Haskell, and I’m rather pleased by the terseness of the result, so I felt compelled to share it.

import List(elemIndex)

shift i x = (cycle [‘A’..’Z’])!!(maybe (error “Bad character ” ++ (show x)) (+i) (elemIndex x [‘A’..’Z’]))
main = sequence_ (map (\i -> putStrLn $ map (shift i) cipherText) [0..25])

If you ignore the boilerplate, there’s only two lines of actual code there :-). I happen to think it’s rather understandable, though I’d be interested to get the opinion of someone who isn’t a Haskell hacker on the veracity of that statement. A hint you may need understanding it is that “cycle a_list” just returns the list which is “a_list” concatenated infinite times, and that “!!” is Haskells list indexing operator.

On this note, I saw an interesting blog post at retrospections the other day which compared the LOC count for a number of open source source control programs: Darcs (Haskell) and Mercurial (Python) were tied at 20KLOC, with the next best being Bazaar-NG (Python) at 47KLOC. Obviously this isn’t a strictly apples-to-apples comparison, though, so take it with a grain of salt!

Ugly field selection syntax
OK, the most trivial complaint first. If we have defined a record like this:

> data Bird = Bird { name :: String, wings :: Integer }

How do we go about accessing the ‘name’ and ‘wings’ fields of a record instance? If you are used to a language like C, you might say it would look something like this:

> (Bird { name = “Fred”, wings = 2 }).name

Unfortunately, this isn’t the case. Actually, declaring a record creates a named function which uses pattern matching to destroy a passed record and return the name. So access actually looks like this:

> name (Bird { name = “Fred”, wings = 2 })

Prefix notation may please the Lisp fans, but for me at least it can get a bit confusing, especially when you are dealing with nested fields. In this case, the code you write looks like:

> (innerRecordField . outerRecordField) record

Which (when read left to right, as is natural) is entirely the wrong order of accessors. However, it is possible to argue this is just a bug in my brain from having spent too long staring at C code.. anyway, let’s move onto more substantitive complaints!

Namespace pollution
Imagine you’re writing a Haskell program to model poulty farmers who work as programmers in their spare time, so naturally you want to add to the Bird record above a Person record:

> data Person = Person { name :: String, knowsHaskell :: Boolean }

But I think you’ll find the compiler has something to say about that….

Multiple declarations of `’
Declared at: Records.hs:3:19

Ouch! This is because of the automatic creation of the ‘name’ function I alluded to earlier. Let’s see what the Haskell compilers desugaring would look like:

> newtype Bird = Bird String Integer

> name :: Bird -> String
> name (Bird value, _) = value

> wings :: Bird -> Integer
> wings (Bird _, value) = value

> newtype Person = Person String Boolean

> name :: Person -> String
> name (Person value, _) = value

> knowsHaskell :: Person -> Boolean
> knowsHaskell (Person _, value) = value

As you can see, we have two name functions in the same scope: that’s no good! In particular, this means you can’t have records which share field names. However, using the magic of type classes we can hack up something approaching a solution. Let’s desugar the records as before, but instead of those name functions add this lot:

> class NameField a where
> name :: a -> String

> instance NameField Bird where
> name (Bird value, _) = value

> instance NameField Person where
> name (Person value, _) = value

All we have done here is used the happy (and not entirely accidental) fact that the ‘name’ field is of type String in both records to create a type class with instances to let us extract it from both record types. A use of this would look something like:

> showName :: (NameField a) => a -> IO String
> showName hasNameField = print (“Name: ” ++ (name hasNameField))

> showName (Person { name = “Simon Peyton-Jones”, knowsHaskell = true })
> showName (Bird { name = “Clucker”, wings = 2 })

Great stuff! Actually, we could use this hack to establish something like a subtype relationship on records, since any record with at least the fields of another could implement all of its field type classes (like the NameField type class, in this example). Another way this could be extended is to make use of the multiparameter type classes and functional dependency extensions to GHC to let the field types differ.

Of course, this is all just one hack on top of another. Actually, considerable brainpower has been expended on improving the Haskell record system, such as in a 2003 paper by the areforementioned Simon Peyton-Jones here. This proposal would have let you write something like this:

> showName :: (r <: { name :: String }) -> IO String
> showName { name = myName, .. } = print (“Name: ” ++ myName)

The “r <: { name : String }” indicates any record which contains at least a field called name with type String can be consumed. The two dots “..” in the pattern match likewise indicate that fields other than name may be present. Note also the use of an anonymous record type: no data decleration was required in the code above. This is obviously a lot more concise than having to create the type classes yourselves, as we did, but actually we can make it even more concise by using another of the proposed extensions:

> showName { name, .. } = print (“Name: ” ++ name)

Here, we omit the “name = myName” pattern match and make use of so-called “punning” to give us access to the name field: very nice! Unfortunately, all of this record-y goodness is speculative at least until Haskell’ gets off the ground.

Record update is not first class

Haskell gives us a conventient syntax for record update. Lets say that one of our chickens strayed too close to the local nuclear reactor and sprouted an extra limb:

> exampleBird = Bird { name = “Son Of Clucker”, wings = 2 }
> exampleBird { wings = 3 }

The last line above will return a Bird identical in all respects except that the wings will have been changed to 3. The naïve amongst us at this point might then think we could write something like:

> changeWings :: Integer -> Bird -> Bird
> changeWings x = { wings = x }

The intention here is to return a function that just sets a Bird records “wings” field to x. Unfortunately, this is not even remotely legal, which does make some sense since if it was record update should, to follow normal function application convention, look more like this:

> { wings = 3 } exampleBird

Right, I think that’s got everything that’s wrong about Haskell records off my chest: do you know of any points I’ve missed?

Edit: Corrected my pattern match syntax (whoops :-). Thanks, Saizan!
Edit 2: Clarified some points in response to jaybee’s comments on the Reddit comments page.