Cryptic Crossword: Amateur Crypto and Reverse Engineering

bbanyc · on Feb 14, 2014

The punchline to this story is that a few years ago the Times stopped scrambling their .puz files, making all this reverse-engineering work largely irrelevant.

(At the time I was using a modified version of the "xword" program in Debian's repo, which didn't detect whether the file was scrambled. In other words, it treated every letter as wrong because it didn't match the enciphered grid. I ended up hacking in some code to detect these files and disable the check/reveal features when playing them.)

breadbox · on Feb 15, 2014

Very true! I mentioned that fact the first time I gave this presentation, but it wound up being an anticlimactic ending, so I chose to omit it from the written essay. (And at this point, the focus is more about the process of reverse-engineering anyway.)

EDIT: To be precise, there were still a few other crossword publishers using the scrambling feature. None as important as the New York Times, though, of course.

stavros · on Feb 15, 2014

Problem solving is its own reward!

davepeck · on Feb 15, 2014

This is a great and entertaining read about reverse engineering.

It's such a good read that this is almost beside the point... but, as it happens, I worked on and reverse-engineered this same "encryption" scheme (I hesitate to use the word) for an iOS app that never shipped. I just dumped the code (which seems to have been written in late 2008) up on github... it's old, and messy, but hey, maybe it's fun for someone:

https://github.com/davepeck/puzfile

danielpunkass · on Feb 14, 2014

Very cool rundown of the approach to trying to decode this. FWIW there is also a significant archive of information about the format here, including information about the scrambling: https://code.google.com/p/puz/wiki/FileFormat

snori74 · on Feb 14, 2014

Indeed, staggering how patient and determined some people can be. I love this line when asked by his friend if he would be able to reverse-engineer this scrambling algorithm:

"My response was: maybe. Hard to say, but I'm willing to try. Privately, though, my reaction was THIS IS MY DREAM PROJECT AND THERE IS NO WAY I'M NOT SPENDING ALL AVAILABLE FREE TIME ON THIS."

zem · on Feb 14, 2014

yeah, i found that extremely helpful when writing the acrosslite module for a crossword format converter. i decided not to bother with scrambled grids for the moment, though, since they are a very acrosslite-specific feature; once the project progresses a bit further i'll go back and add them in.

TrainedMonkey · on Feb 14, 2014

Extremely methodical and determined approach. Especially analysis of the errors that partially successful approaches encountered. Well done.

mistercow · on Feb 15, 2014

>In a way this is just a restatement of Occam's Razor, but I like it because it clarifies why Occam's Razor is a good idea. It's not because simpler solutions are actually more likely to be true; they usually aren't. It's because it's almost always easier to improve a simple solution by adding complexity, than it is to improve a complicated solution by digging out a simple solution buried within it.

While the second part of that is an interesting observation, the first part is simply false. It basically comes down to prior probabilities and conjunctions. Every bit of new information implied by a hypothesis is another "and". It is a simple fact that P(X) ≥ P(X and Y), so the more conjunctions your hypothesis implies (the more complex it is) the lower its prior probability.

bluedino · on Feb 14, 2014

Wouldn't it have been easier to disassemble the program that works with these files, and analyze the code?

odin1415 · on Feb 14, 2014

Maybe, but the author mentioned that he wanted to reverse engineer it as a black box, out of legal concerns. It makes a more interesting challenge this way too.

voltagex_ · on Feb 15, 2014

I particularly liked the approach to automation of the application, even through WINE. Setting the time with LD_PRELOAD is a really neat trick.

bluedino · on Feb 14, 2014

Ah, I skimmed the article and didn't see that part - makes for a much more interesting writeup anyway.

mistercow · on Feb 15, 2014

Man, reading little-endian binary formats makes my head hurt. I get why it's done that way, but what a nightmare for comprehending what you're reading.

userbinator · on Feb 15, 2014

After enough time spent reading hexdumps you get used to it, and then all of a sudden big-endian feels really backwards.

mistercow · on Feb 15, 2014

Well, assuming you're reading little-endian hexdumps. I cut my teeth on OS X back in the PPC days, so to me it is just the opposite.