Python for Bioinformatics: Dissecting RSA keys in Python (2)

The public key from last time was given (in id_rsa.pub) as:

AAAAB3NzaC1yc2EAAAABIwAAAIEA0QgnG+LBQosx
pbkJIU0eWV9Zf7L63fy3BXp6T9RQH84yISs8SMvF
V4J1NYJQtNpGopNFlvzsuFnnxaRaM8ureFIQVa8k
/dRzv3r7QyIuoGyNQLDk3K8Tse/55fjuIxJFEf5j
voJs3Z/jYdPHreg8wzNCB/uMkbCccYsVt6lhkX8=

(It has been broken into 40 character lines). If we look at the first 32 characters of this base64 encoding:

AAAAB3NzaC1yc2EA
AAABIwAAAIEA0Qgn

the decoded version of this in hex is:

0x0  0x0  0x0 0x7  0x73 0x73 0x68 0x2d 0x72 0x73 0x61 0x0
0x0  0x0  0x1 0x23 0x0  0x0  0x0  0x81 0x0  0xd1 0x8  0x27

The first four bytes are

0x0 0x0 0x0 0x7

interpreted as a 32-bit unsigned int that is equal to 7, and indicates that the next 7 bytes are grouped together to form this element of the data (which is 'rsa-ssh', as I mentioned before).

0x73 0x73 0x68 0x2d 0x72 0x73 0x61
   s    s    h    -    r    s    a

The next 4 bytes after that are:

0x0 0x0 0x0 0x1

indicating the following block is a single byte of data: 0x23. In ASCII-encoding this would be the character '#', but it's actually an unsigned int:

>>> int('0x00000023', 16)
35

This is the value of the exponent for the key pair. More on this in a moment. Finally we get to

0x0 0x0 0x0 0x81

>>> int('0x00000081', 16)
129

It turns out there are exactly 129 bytes left in the data. In the code from the first post (from here), we used the struct module to unpack these three segments of data (we labeled them "parts").

According to the docs, the struct module can be used to interpret strings as packed binary data.

The first argument to unpack is a format string. The first character of the format string can be used to indicate the byte order, size and alignment of the packed data

The '>' indicates big-endian data, data laid out in registers in the same order as it is in memory. While Intel x86 (and current Macs) are little-endian, network order is big-endian and (I suppose) the SSH standard respects this. The next format character 'I' indicates the data following are unsigned ints.

import struct

hL = ['0x0','0x0','0x0','0x81']
iL = [int(h,16) for h in hL]
cdata = ''.join([chr(n) for n in iL])
res = struct.unpack('>I', cdata)

>>> print res
(129,)

It returns a tuple and we want the first value.

In the code from yesterday, we simply copied the correct number of bytes in each segment (as "characters"---not ASCII).

In the last part of this example, we need to transform those bytes into numerical data. We'll use struct.unpack again, except this time specifying binary data. For the exponent we had a single byte '\x23'', corresponding to character '#' which is 35 as an unsigned int:

>>> struct.unpack('B','#')
(35,)

Our function f passed the first element of this tuple back to a big list comprehension.

e = eval('0x' + ''.join(['%02X' % f(x) for x in parts[1]]))

In the case of the exponent (e) it operated on only a single value, so I'll rewrite it to clarify what this does. Remembering that f(x) will return 35, we'll get rid of the list stuff and just do:

>>> s = '%02X' % 35
>>> s
'23'
>>> e = eval('0x' + s)
>>> 
>>> e
35

(I'm not too swift with format strings, but you can read about them here).

And it doesn't do much in this case! But suppose we had two of these bytes in a row. Then:

>>> s = ('%02X' % 35) * 2
>>> s
'2323'
>>> eval('0x' + s)
8995
>>> 35*256 + 35
8995

eval will turn our string of bytes into a multi-byte unsigned int. This is news to me. So, if we actually had the string of bytes in hex divided into two sets of bytes, those up to and including '0x81'

['0x0', '0x0', '0x0', '0x7', '0x73', '0x73', '0x68', '0x2d', '0x72', '0x73', '0x61', '0x0', '0x0', '0x0', '0x1', '0x23', '0x0', '0x0', '0x0', '0x81']

and those after:

L = ['0x0', '0xd1', '0x8', '0x27', '0x1b', '0xe2', 
      '0xc1', '0x42', '0x8b', '0x31', '0xa5', '0xb9', '0x9', '0x21', '0x4d', '0x1e', '0x59', 
      '0x5f', '0x59', '0x7f', '0xb2', '0xfa', '0xdd', '0xfc', '0xb7', '0x5', '0x7a', '0x7a', 
      '0x4f', '0xd4', '0x50', '0x1f', '0xce', '0x32', '0x21', '0x2b', '0x3c', '0x48', '0xcb', 
      '0xc5', '0x57', '0x82', '0x75', '0x35', '0x82', '0x50', '0xb4', '0xda', '0x46', '0xa2', 
      '0x93', '0x45', '0x96', '0xfc', '0xec', '0xb8', '0x59', '0xe7', '0xc5', '0xa4', '0x5a', 
      '0x33', '0xcb', '0xab', '0x78', '0x52', '0x10', '0x55', '0xaf', '0x24', '0xfd', '0xd4', 
      '0x73', '0xbf', '0x7a', '0xfb', '0x43', '0x22', '0x2e', '0xa0', '0x6c', '0x8d', '0x40', 
      '0xb0', '0xe4', '0xdc', '0xaf', '0x13', '0xb1', '0xef', '0xf9', '0xe5', '0xf8', '0xee', 
      '0x23', '0x12', '0x45', '0x11', '0xfe', '0x63', '0xbe', '0x82', '0x6c', '0xdd', '0x9f', 
      '0xe3', '0x61', '0xd3', '0xc7', '0xad', '0xe8', '0x3c', '0xc3', '0x33', '0x42', '0x7', 
      '0xfb', '0x8c', '0x91', '0xb0', '0x9c', '0x71', '0x8b', '0x15', '0xb7', '0xa9', '0x61', 
      '0x91', '0x7f', '0x0']

Note: The extra byte at the end here is caused by our having decoded by hand.

Paste it into the interpreter. Then, they have to be padded out...

for i in range(len(L)):
    s = L[i]
    c = s[-1]
    if len(s) < 4:  
         L[i] = '0' + c
    else:  
        L[i] = s[-2:]

Remove the first and last bytes and turn it into one long hex value:

>>> s = '0x' + ''.join(L[1:-1])
>>> eval(s)
146787154640181728129549360865206451974125178122992752499327844555082827713683060312609720092642289502088305950292885968241446393671268243539585900329449529436170858131743078225065287541399805133121852823856839448386268013284889428740476278604926908637093742329548018673369795976229837442944484597101722964351L

That matches what we got yesterday, and what the rsa module gave us for n from the private key.

This use of eval seems to be widely known, but I would never have appreciated it from just reading the docs.

Here is a quick script that shows all the values for each set of four characters in the base64 encoding, first 9 bytes of output first:

> python script2.py 
['A', 'A', 'A', 'A']
[0, 0, 0, 0]
['0b0', '0b0', '0b0', '0b0']
['000000', '000000', '000000', '000000']
['00000000', '00000000', '00000000']
0x0 0x0 0x0
0 0 0 

['B', '3', 'N', 'z']
[1, 55, 13, 51]
['0b1', '0b110111', '0b1101', '0b110011']
['000001', '110111', '001101', '110011']
['00000111', '01110011', '01110011']
0x7 0x73 0x73
7 115 115 

['a', 'C', '1', 'y']
[26, 2, 53, 50]
['0b11010', '0b10', '0b110101', '0b110010']
['011010', '000010', '110101', '110010']
['01101000', '00101101', '01110010']
0x68 0x2d 0x72
104 45 114

from string import *
v = True

cL = list(uppercase + lowercase + digits + '+/')
D = dict(zip(cL,range(len(cL))))
D['='] = 0

key = '''AAAAB3NzaC1yc2EAAAABIwAAAIEA0QgnG+LBQosxpbkJIU0eWV9Zf7L63fy3BXp6T9RQH84yISs8SMvFV4J1NYJQtNpGopNFlvzsuFnnxaRaM8ureFIQVa8k/dRzv3r7QyIuoGyNQLDk3K8Tse/55fjuIxJFEf5jvoJs3Z/jYdPHreg8wzNCB/uMkbCccYsVt6lhkX8='''

L = list(key)

while L:
    s = L[:4]
    L = L[4:]
    if v:  print s
    dL = [D[c] for c in s]
    if v:  print dL
    dL = [bin(c) for c in dL]
    if v:  print dL
    bL = list()
    for b in dL:
        n = 8 - len(b)
        b = n*'0' + b[2:]
        bL.append(b)
    if v:  print bL
    b = ''.join(bL)
    bL = [b[:8], b[8:16], b[16:]]
    if v:  print bL
    iL = [int('0b' + b,2) for b in bL]
    for i in iL:
        print hex(i),
    print
    for i in iL:
        print i,
    print '\n'

Tuesday, August 30, 2011

Dissecting RSA keys in Python (2)