Tuesday, July 21, 2009

Python for Simulation: Ulysses

In the historical remarks for the chapter on Markov Chains in Grinstead and Snell (p. 465), they mention an experiment of Claude Shannon in which a message is generated from a source text by first choosing an initial word in the text, picking (at random) a word which follows that one in the text, and then continuing for a number of cycles. Since each step depends only on the current word, we have a Markov Chain.

I wrote a short Python script to do this. Since I have the text for James Joyce's novel Ulysses (here, from Project Gutenberg), I used that for the source. Some header and footer material was deleted by hand. Here is the output:

i pass on coronation day by the brazen cars
woodshadows floated silently by greed and
behold they wait no she said as i wanted
to win in a cockney accent o thats our hearts
his gaze and squat its loose tobaccoshreds
slanted


Sounds a bit like Joyce, don't you think?

And the full listing:

import random

def loadText():
FH = open('ulysses.mod.txt','r')
text = FH.read()
FH.close()
text = text.strip().replace('\n',' ')
text = text.lower()
alpha = 'abcdefghijklmnopqrstuvwxyz'
s = alpha + ' '
tL = [c for c in text if c in s]
return ''.join(tL).split()

def parse(s):
words = s[:20000]
D = dict()
e1 = words.pop(0).strip()
while words:
e2 = words.pop(0).strip()
try: D[e1].append(e2)
except KeyError: D[e1] = [e2]
e1 = e2
return D

def speak(D,first,N=50):
L = [first]
w1 = first
for i in range(N):
w2 = random.choice(D[w1])
L.append(w2)
w1 = w2
return ' '.join(L)

def pprint(s):
i = 0
n = 40
while True:
j = s.find(' ',i+n)
if j == -1:
print s[i+n:]
break
if i: print s[i+1:j]
else: print s[:j]
i = j

random.seed(1357)
text = loadText()
D = parse(text)
result = speak(D,first='i')
pprint(result)