Moving On..

August 27, 2007

I’ve moved my blog to my own website, here, so I can take control of the formatting of my posts, and finally stop posting broken code! I’ve also put up a new article, after a long delay due to my not having Internet access all summer..

I love challenges of all kinds, both the algorithmic and cryptographic. There is just something inherently compelling about pitting your wits alone against an intellectual challenge, free to apply any and all means to beat it. I recently took part in a crypto challenge where we were meant to break a text encoded with the running key cipher, which I hadn’t encountered before, and I came up with a rather satisfying solution for I thought might be good to share here.

For those of you who aren’t familiar with the running key cipher, it’s very simple. Say you have some text you would love to be kept secret:


But still feel the need to communicate it to someone, with whom you have agreed some (usually well known) key beforehand, say:


We just “add” the two texts together, with the letters being used as numbers in the obvious way: A = 0, B = 1, …, Z = 25. This means that e.g. C + B = 2 + 1 = 3 = D. Furthermore, we do this modulo 26, so that e.g. Z + B = A. Additionally, since in our case our key is a bit longer than the secret message, we will just not use the last couple of letters of the key. This means that our encoded message is:


The receiver can just subtract their known key to retrieve the original text. Wow, great! Since we never never reuse the key at any point in that message, and assuming we never use they key again, it must constitute a one time pad! This means that we must have perfect security, right? Ah, grasshopper, if it only it were so simple! Actually, this is not very secure at all because both our key and our message exhibit distinctly non-random distributions of letters. A simple example of this can be seen if you imagine that we subtract just the letter E (the most commonly used in English) from every letter in the intercepted ciphertext. We would expect to get considerably more than 1/26 of the resulting letters being either from the key or plaintext. If we do this with our example, we obtain:

K    K       K      K  K      KP

I’ve labelled each resulting letter with a K or a P depending on whether it came from the key text or the plain text. As you can see, most of the letters are from the key because it contains very few E characters.

This is all very well, but it’s clearly a fairly crude attack. If we want to have a chance of actually breaking the cipher we need some more heavyweight methods. As far as I know, there are a choice of three major techniques:

  • Known plaintext attacks
  • Crude N-gram analysis
  • Hidden Markov model N-gram analysis

The last one is what I actually used to break the challenge, and the one that I think is just plain coolest, but we need to build up to it as it’s pretty scary :-). So, let’s take a look at the other techniques first:

Known Plaintext Attack
This is definitely the easiest attack to employ if you know something about the message being sent. For instance, if you, as the evil attacker, happen to know that the sender is a passionate gourmand, you might guess that EATING will appear somewhere in the key or plaintext. Knowing this, you can exhaustively subtract EATING from the ciphertext at every offset. If we do this for our example text, we obtain:

Text offset Text deciphered with key

Most of those look like gibberish, but one stands out as eminently readable English text: YOURBO. This is, of course, part of the key text. The diligent cryptanalyst will then typically be able to use further guesses (perhaps inspired by the discovered piece of text) to reveal more of the message and key. For instance, he might try every word beginning with BO to try and extend the known plaintext rightwards (though this is a rather difficult proposition since there are at least 368 possibilities according to my dictionary). A small aside here about my methodology: I found the optimal experience was got by combining a suitable dictionary (I use 5desk) with grep in a Cygwin Bash shell. You can then make queries such as the one below, which counts the number of words in the dictionary beginning with “bo”.

$ grep ‘^bo’ 5desk.txt | wc -l

Note that it is not strictly possible to tell whether the English words you recover are part of the key or plain text. This is because of the symmetry of the modular addition we got to perform the ciphering step. However, it is usually possible to work it out by your knowledge of the messages probable contents. Once you know which text stream is part of the key, a Google search will usually allow you to obtain the entire thing from your fragmentary knowledge, assuming that the key is a public text (as is typical: Bible extracts are popular). This will let you break the code!

However, what if you don’t know anything about the message, or, like me, you’re just a bit rubbish at making good guesses as to how to extend your known fragment of text? We probably need to employ something a little more sophisticated, like one of the N-gram analyses I mentioned earlier. However, as this blog entry is already getting overlong I shall leave exposition of these rather more clever techniques for a future entry: watch this space, especially those of you whom I have just bored senseless by explaining the absolute basics.

For those of you who are unaware, the lastest edition of the very entertaining Uninformed Journal came out recently. It’s a pretty interesting edition, though not as bulky as some of their previous releases: I hope the project isn’t losing momentum. In particular I found that the article by skape (one half of the duo who originally broke Windows Patchguard [pdf]) about dynamic analysis of memory access in arbitrary software was nice: he outlines a variety of tricks for doing this, using page tables, segment registers and runtime instrumentation, and some research areas where this would be useful, like taint propagation. There’s probably a product or two waiting in there for some enterprising company to create :-).

For those of you who enjoy Uninformed, you can also read skywings (the other half of the areformentioned duo) blog Nynaeve for regular fixes of in-depth technical information.

Well, I’m a bit behind the curve here. I hadn’t until now felt the need to install GreaseMonkey, but Experts Exchange (which frequently shows up in my Google search results) have started to blur the comments people make on the questions. You have to sign up to view them, and since I have a pathalogical aversion to such inconveniences as 30 seconds of registration I did what any good programmer would and spent 10 minutes writing a script to solve the program.

The script plugs into GreaseMonkey and first removes the blurring, which is just done by an overlay, and then replaces the answer text with its ROT13ed equivalent, since thats the “encryption” they have opted to use! Anyway, the whole thing is basically a mashup of the scripts here and here, but I include the full source below for your convenience: do with it as you will.

// ==UserScript==
// @name ExpertsExchangeFilter 2
// @namespace All
// @description Remove Experts Exchange Stuff
// @include*
// @include*
// ==/UserScript==

function rot13(src) {
var dst = new String('');
var b;
var t = new String('');
var clear = 0;
for (var ctr = 0; ctr < src.length; ctr++) {
b = src.charCodeAt(ctr);
if (60 == b || 91 == b) {
clear = 1;
if (!clear) {
if (((b>64) && (b<78)) || ((b>96) && (b<110))) {
b = b + 13;
} else {
if (((b>77) && (b<91)) || ((b>109) && (b<123))) {
b = b - 13;
t = String.fromCharCode(b);
dst = dst.concat(t);
if (b == 62 || b == 93) {
clear = 0;
return dst;

a = document.getElementsByTagName('div')
for (i = 0; i < a.length; i++)
if (a[i].className == 'infoBody')
{ a[i].removeChild(a[i].childNodes[1]); }
else if (a[i].className == 'answerBody quoted')
{ a[i].innerHTML = rot13(a[i].innerHTML); }

Cute Haskell Code

May 9, 2007

For some recent supervision work on my Security lectures, I was given the task of decoding a string encrypted with a simple shift cipher. This cipher, given a key, simply moves each letter on in the alphabet by an amount given by the key, wrapping at the end. Being a good functional programmer, I decided to implement a brute force solver for this in Haskell, and I’m rather pleased by the terseness of the result, so I felt compelled to share it.

import List(elemIndex)

shift i x = (cycle [‘A’..’Z’])!!(maybe (error “Bad character ” ++ (show x)) (+i) (elemIndex x [‘A’..’Z’]))
main = sequence_ (map (\i -> putStrLn $ map (shift i) cipherText) [0..25])

If you ignore the boilerplate, there’s only two lines of actual code there :-). I happen to think it’s rather understandable, though I’d be interested to get the opinion of someone who isn’t a Haskell hacker on the veracity of that statement. A hint you may need understanding it is that “cycle a_list” just returns the list which is “a_list” concatenated infinite times, and that “!!” is Haskells list indexing operator.

On this note, I saw an interesting blog post at retrospections the other day which compared the LOC count for a number of open source source control programs: Darcs (Haskell) and Mercurial (Python) were tied at 20KLOC, with the next best being Bazaar-NG (Python) at 47KLOC. Obviously this isn’t a strictly apples-to-apples comparison, though, so take it with a grain of salt!

OK, to follow up on my last post about the quirks of XMLHTTPRequest, fuzzyman very kindly provided most of the solution I needed.

What I was trying to accomplish is optional HTTP authentication: that is, a resource logs you in if your credentials are correct, but if they aren’t present then it just lets you go on as an anonymous user. This is essential if you are developing, say, a web shop: if you want to offer personalized item selections you need to request login, but if you require authentication just to browse the site you’ve lost a good LARGE_NUMBER% of your customers right there.

However, as fuzzyman rightly pointed out, most browers do not even bother to send the Authorization header unless they actually get a 401 on a page, even if credentials are explicitly provided (as in my use of XMLHTTPRequest). The solution to this is to “fool” the browser into thinking your site requires authentication by creating a dummy action that just requires authentication. The slight complication is that this action must be in the root directory! If you attempt to create the dummy action in a subdirectory, the browser may only send the authentication information thus forced into it when paths are accessed that appear to be in that directory. This means, for example, that if you have the authenicated action “/sessions/secret” then authorization info will be sent for “/sessions/foo”, but will not for “/products/list”. Making an action like “/secret” works around this, although it is slightly ugly.

class SecretController < ApplicationController

def index
if authenticated?
render :nothing => true
response.headers[“WWW-Authenticate”] = ‘Basic realm=”Controllr”‘
render :nothing => true, :status => 401


So then in your user creation view and login view you will have a script block which forces the logged in / newly created user to login. This example is for user creation (hence @user.password) and uses a modified version Prototype (the de-facto Rails JS library) to perform the request. I had to modify Prototype to add support for using the username and password parameters of the underling XMLHTTPRequest object, despite the fact that they are widely supported in practice.

new Ajax.Request(“<%= url_for :controller => ‘sessions’, :action => “secret” %>”, {
username: “<%= @user.username %>”,
password: “<%= @user.password %>”,

method: “get”,
asynchronous: true,
onComplete: function() {
window.location = “<%= url_for :action => “index” %>”;

Yes, this does send the username and password in plaintext over the wire :-). However, this is OK since they are sent in the clear during signup anyway. I’m also currently using Basic authentication, which means that upon ANY login the username/password are vulnerable (albeit after Base64 decoding): I will probably change this at some point, but the Digest scheme I should be using requires a bit of server side state to prevent replay attacks so is a little tricker to implement.

Right, I should probably try and get some revision done now :-). If only I could answer on Ruby on Rails instead of complexity theory…

So I just spent the last 2 hours or so of my life buggering around with Ruby on Rails and trying to get it to do a RESTful login (i.e. one using HTTP Authorization headers, as opposed to the normal cookie stuff). There are some nice articles about pulling this feat off, such as here and here: the basic trick is to use XMLHTTPRequest to force the username/password from form fields into the browers authentication cache. However, it seems that if the resource your XMLHTTPRequest is trying to talk to never returns a 401 (Access Denied) then XMLHTTPRequest never feels the need to send the Authorization header at all, even if you specify a username and password for it. I’m really at a loss as to why it has this bizarre behaviour, so I’m really hoping I’ve misdiagnosed it, but it’s looking unlikely.

This afternoon has been my first serious attempt to play with Rails, and the whole thing has been nothing but frustration! As well as the usual the-web-is-crap issues like the above, I’ve had to contend with documentation that is scattered over the Ruby and Rails websites, when it exists at all! Some of the stuff I’ve had to use (like the base.send :helper_method call to expose some things neatly to my views) seem vital but don’t appear anywhere but as cursory mentions in changelogs. Furthermore, their habit of introducing breaking changes means some code examples I find don’t work without some obscure patching, and when things go wrong there is so much framework magic going on I have a hard time debugging it! Hopefully this feeling will fade with time, as lots of other people seem to praise Rails to the heavens, but I can’t remember even being this frustrated with a new technology :-).