## Sunday, November 15, 2009

### Python bytes

About ten days ago I posted about looking at the passwords stored in `/etc/xgrid/agent/controller-password` and similar files.

The code I put up is rather silly. It uses a hard-coded dict to translate bytes (in string rep) to hexadecimal. This is not worth spending too much time on, especially since Python 3 has a whole 'nother attitude about strings and bytes, but I thought I would at least show a simple and more correct (I hope) Python 2 approach to this issue.

So, of course we have bits and bytes on the machine, and strings exist only on-screen or paper. We can represent bits and bytes as integers, or as chars, and vice-versa. I'm sure everyone knows we can go from int to chr and back again:

 `>>> chr(78)'N'>>> ord('P')80`

My understanding is that we should view integers as the natural intermediate form for conversion of bits and bytes from base 2 to other bases.

 `>>> bin(15)'0b1111'`

In this representation the binary number `1111` is an int (15) or its string representation ('0b1111'). Python also has string reps for hexadecimal and octal:

 `>>> hex(15)'0xf'>>> oct(15)'017'`

We can go from binary or hex back to int, but we need to specify the base:

 `>>> int('0b1111',2)15>>> int('0xf',16)15`

We don't actually need the leading '0x' or '0':

 `>>> int('f',16)15>>> int('17',8)15`

So, the other day I should have just done:

 `>>> bin(int('0xf',16))'0b1111'`

When reading data from a file:

 `FH = open('script.py','rb')data = FH.read(8)FH.close()print type(data)print len(data)print data`

 `8from bin`

Although the file was opened in "binary" mode, the type actually read was <'str'>, and when the data are printed, it looks like a string. Nevertheless, the data do respond well to a function that operates on binary data and converts it to a hexadecimal string representation.

 `from binascii import *L = [b2a_hex(b) for b in data]print LL = [int(h,16) for h in L]print Lprint [chr(i) for i in L]`

 `['66', '72', '6f', '6d', '20', '62', '69', '6e'][102, 114, 111, 109, 32, 98, 105, 110]['f', 'r', 'o', 'm', ' ', 'b', 'i', 'n']`

The result is rather different if we use the same function on the data as a whole:

 `L = b2a_hex(data)print len(L)for i in range(0,len(L),2): h = L[i:i+2] print h, print chr(int(h,16))`

 `1666726f6d2062696e66 f72 r6f o6d m20 62 b69 i6e n`

In this case, the 8 bytes are converted to 16 hexadecimal characters, and to do the conversion to ints and chars we must read 2 char chunks of the hexadecimal.

Does that make sense?