As you can see from the screenshot or the link
the puzzle is a message consisting of 1936 hex digits, and upon further manipulation it promises to yield the answer. You can actually follow the link to see the answer if you're impatient. I wasn't able to compete (I had a party to cook for, and could not have gone as fast as these guys anyway), but I'd still like to understand how they got there.
Apparently, we need to assemble a number of overlapping 100 digit pieces of a 1936 digit message and then figure out what it means. The pieces were served up to different visitors to the site and collected as comments on reddit. The first objective is to recover the observed data. I believe reddit must have an API for this, but I don't know it. What I did was to save the source for the page with all 410 comments (in the file comments.txt) and then run scrape.py below.
I count 251 strings of 100 characters, all in "0..9a..f".
According to the instructions, the first and last strings are marked, and these were recovered and posted on reddit:
497a78705032357759326873634855675a43526c595341675044317a494852685a484a70636d39684c6e6c6f4b4434354369 //first 100
4270494630674f3349675a5831304948566c636d4e75614342764d4341374a79426c494363674f7941674944393950676f3d //last 100
[UPDATE: There is a link to the complete (known) data from reddit to github here. I count a total of 340 strings, substantially more than what I've been working with in early attempts. This should help with the assembly problem. ]
The data clearly has substantial structure, as indicated by the screenshot from a "find" search for the digit "4" in TextEdit.
Since only hexadecimal digits are present it suggests that this might actually be hexadecimal. It's interesting that not all digits are equally represented. In particular, only those bytes (two hex digits) which encode ASCII "a..zA..Z0..9" are present (although the distribution is decidedly odd). So it seems pretty likely that we just need to decode the hex to ASCII and then figure out what it means.
However, there is an assembly problem that comes first.
Output from count.py (listing at the end):
> python count.py 54 30 48 0 0 40 31 49 1 1 256 32 50 2 2 71 33 51 3 3 58 34 52 4 4 82 35 53 5 5 64 37 55 7 7 39 38 56 8 8 63 39 57 9 9 1070 41 65 A A 133 42 66 B B 1102 43 67 C C 414 44 68 D D 111 45 69 E E 38 46 70 F F 169 47 71 G G 101 48 72 H H 812 49 73 I I 151 4a 74 J J 132 4b 75 K K 667 4c 76 L L 399 4d 77 M M 532 4e 78 N N 83 4f 79 O O 82 50 80 P P 67 51 81 Q Q 65 52 82 R R 286 53 83 S S 58 54 84 T T 51 55 85 U U 12 56 86 V V 56 57 87 W W 31 58 88 X X 315 59 89 Y Y 92 5a 90 Z Z 16 61 97 a a 13 62 98 b b 278 63 99 c c 61 64 100 d d 58 65 101 e e 8 66 102 f f 938 67 103 g g 57 68 104 h h 247 69 105 i i 201 6a 106 j j 116 6b 107 k k 62 6c 108 l l 64 6d 109 m m 97 6e 110 n n 61 6f 111 o o 209 70 112 p p 896 73 115 s s 25 74 116 t t 23 75 117 u u 15 76 118 v v 532 77 119 w w 211 78 120 x x 414 79 121 y y 192 7a 122 z z |
scrape.py
# python scrape.py > data.txt fn = 'comments.txt' FH = open(fn) data = FH.read() FH.close() L = list() i = 0 while True: j = data.find('>',i) j += 1 i=j k = j + 100 if k > len(data): break if not data[k] == '<': continue w = data[j:k] if ' ' in w or '>' in w or '<' in w: continue for c in w: assert c in '0123456789abcdef' L.append(w) if i == 0: break L = list(set(L)) L.sort() for w in L: print w |
import binascii from collections import Counter def load_data(): fn = 'data.txt' FH = open(fn,'r') data = FH.read().strip() FH.close() L = data.split('\n') return L L = load_data() n = 2 cL = list() for line in L: for i in range(0,len(line)-n+1,2): cL.append(line[i:i+n]) C = Counter(cL) for h,n in sorted(C.most_common()): i = int(h, base=16) print '%4d %s %4d %s %s' %\ (n, h, i, chr(i), binascii.unhexlify(h)) |