Hex, leet, and constrained alphabets - for me3dia

So Andrew Huff had this silly and wonderful idea: displaying six-character words as hexidecimal strings, and interpreting those strings as hex-style color codes (the #006A3f type strings used notably for declaring colors in html code). There are some nice graphics with a few rounds of ideas at the post above; and someone has even thrown together a live application, “L33t text in c0l0r“, to let folks play with the idea in realtime.

So what’s missing? A vocabulary list! And that’s just what I’ve put together over the morning cup of tea. Four lists, actually, available here in order of strictness (and, by association, readability):

alphabetic - only letters a through f allowed
leet - some numeral-for-letter subsitutions
leeter - more aggressive subsitution
leetcore - identical list to ‘leeter’, but with gratuitous subsitutions for A, B and E, too.

More details about this — generation method, thoughts — after the jump.

The pure alphabetic list has two problems:

1. It’s very short.

2. All the words come out in faded pastels.

The first problem is not surprising; very tight alphabetic constraints means very few words that will match. The second problem is more interesting as a side-effect: A-F are the high-value digits in hex, and so strings that contain only alphabetic characters will have large values for each of the Red, Green, and Blue components of their represented color. So introducing some numeral substitutions doesn’t just broaden the vocabulary of these hex colorstrings, it significantly expands the palette as well.

Here’s the full (tiny, yes) alphabetic list, color coded:

#ACCEDE (accede)
#BEADED (beaded)
#BEDDED (bedded)
#BEEFED (beefed)
#CABBED (cabbed)
#DABBED (dabbed)
#DECADE (decade)
#DEEDED (deeded)
#DEFACE (deface)
#EFFACE (efface)
#FACADE (facade)

So, given the lousy constraints (both linguistic and chromatic) of a purely alphabetic list, it seemed like a good idea to allow for some leet-speak style subsitutions — which is, in fact, well within the spirit of the examples that Huff has already put up. The question is this: how much substitution to allow?

Every substitution means extra parsing work for the reader. Some subsitutions, like numeral 0 for letter O, don’t cause much trouble; 5 for S is a little bit more work, but not bad; 7 for T is moving into advanced math territory — parsable by the seasoned leet reader but probably a bit boggling without explanation to the uninitiated.

There’s arguments in both directions, so I decided to build a small spectrum of lists and leave it at that. The leet list includes a moderate collection of substitutions: 1 = I, 2 = Z, 5 = S, 6 = G, 0 = O. (The use of G I’m wavering on a bit, but what they heck; the budding vocabularist can use their aesthetic discretion in picking words from the list).

The leeter list adds 7 = T, 1 = L. T is a common letter, so this tends to really break up the readibility of the list; likewise with L (or, really, lowercase l as the justification for the replacement, despite my uppercase rendering of the output strings), and the further ambiguity between 1 = I and 1 = L mucks things up as well. Still, this is a much larger list than the ‘leet’ version, so aggressive pickers-and-choosers may want to work with this.

Finally, the leetcore list is a concession to the truly obnoxious — it’s the same list as ‘leeter’, but includes substitutions for 4 = A, 3 = E, and 8 = B. The list is less readable but no more expansive. Niche product.

How this was generated:

I threw together a perl script that reads in a word list (in this case, the /usr/share/dict/american-english file on my Dreamhost server), strips out words of more or less than 6 characters, and then rejects anything that doesn’t fit the specific profile being run against (alpha, leet, leeter). It then does substitutions according to profile, and chucks out the final list with hexified and original word pairs, as in the above example for the “alpha” profile.

Pretty simple stuff, in the end — it’s no histogram steganography, certainly — but a fun little Saturday morning project. Thanks for the fun, Mr. Huff!

4 Comments »

  1. Josh Millard Said,

    March 1, 2008 @ 12:47 pm

    I was a bit late this party—I only found out about the hexery this morning, via metachat—so credit for prior art is due:

    - Stuart Langridge dropped a comment into the me3dia.com thread to show off this elegant little bit of shell work:

    Assuming that i=1, s=5, o=0, we can have:

    $ egrep -i ‘^[abcdefiso]{6}$’ /usr/share/dict/words | sed ’s/s/5/gi;s/o/0/gi;s/i/1/gi’| xargs

    I’d link to the comment, but there are no permalinks on me3dia.com comments! Secret shame revealed, Mr. Huff.

    - Then, later yesterday evening, Ned Batchelder posted a nice wordlist with color coding (which is a rather nice way to present it), for a list that’s about the same length as my ‘leet’ variant above, and about a quarter the size of my ‘leeter’ list.

    Ned’s table there links to an older post by him of 805 hex/leet words of varying lengths, too.

  2. Josh Millard Said,

    March 1, 2008 @ 12:51 pm

    Also, a visualization thought: why not take these words and render them on some sort of color wheel, according to their hex-rgb value? Or, maybe more visually compelling and more abstract: render the non-leet version of the words over colorwheel space, but color the words according to their position and omit the actual colorwheel itself. The words become the (gappy, unevenly distributed) colorwheel. Might be neat to look at.

    Also also, it could be fun to add another dimension to this by looking at word frequency of these various hex strings. Do a word could thing with font size for each word weighted to commonality in a good-sized corpus, perhaps? You could even figure the “average” color of hex-string words based on the weighting.

  3. DigitalSeraphim Said,

    March 4, 2008 @ 6:00 am

    What about three letter words, using the ASCII values for the hex (don’t know if these tags are going to work, if they don’t can you fix them Josh?)->

    cow = #636f77
    COW = #434f57
    Cow = #436f77

    -ds

  4. Josh Millard Said,

    March 4, 2008 @ 8:10 am

    That, sir? High, high nerdity. I think it’s too obfuscated for this particular game, but it has some charm to it.

RSS feed for comments on this post · TrackBack URI

Leave a Comment