Python for Bioinformatics: Hidden Markov Models (6)

Tuesday, November 17, 2009

Hidden Markov Models (6)

I've written a Python script to make an HMM for the "occasionally dishonest casino." There is nothing remarkable about the code. Here is a link.

It uses the MarkovHelper module from here, modified a bit. Because long chains yield very low likelihoods, there is a function to convert the probabilities to logs:

def logs(self):
    def log2(x):  return log(x)/log(2)
    fL = [log2(n) for n in self.fairProb]
    lL = [log2(n) for n in self.loadProb]
    D = dict()
    for k in self.tranProb:
        D[k] = log2(self.tranProb[k])
    return fL, lL, D

There are two points where you can make edits to get more verbose output. Here is an example of output from a model with both transition probabilities set to 0.2. The top line in each group is the sequence of states, then the observed data, followed by the results from the Viterbi algorithm.

fraction correct: 0.6692
0 .. 50
LLFFFFFFFFFFFFFFLLLLFFFFFFFFFFFLLLLFFFFLFLLFLLLLLF
65466142352514466666314623515236466525163311362664
LLLLLFFFFFFFFFFLLLLLFFFFFFFFFFFLLLLFFFFFFFFFFLLLLF

50 .. 100
LLLLLLLLLFFFFFFFLLLLFFLLFFFLLLLFFFFLLFFFFLLLLLFFFF
43264622635463146366241654162162245146442626631643
FFFFFFFFFFFFFFFFLLLLFFFFFFFFFFFFFFFFFFFFFLLLLLLLLL

100 .. 150
LLLLLLFFLLLLFFFLLLFFFFFFLLLFLLLFFFLLLLFFFLLLLLLLLF
63421525664134456562136112561166465362562126526622
LFFFFFFFFFFFFFFFFFFFFFFFFFFLLLLLLLLLLLLLLLLLLLLLLL

150 .. 200
LLLLLLLLLLLLLLLLFLLLLLLFLLLLLLLFFFFFFFFFFFFLLLLFFF
66632561666263461461635121666361266546114312626611
LLLLLLLLLLLLLLLLLLLLLFFFFFLLLLLLLLLLLLFFFFFFLLLLLL

At first, this looks promising. But remember, we tell the model what all the probabilities are. So, knowing that both states are equally likely, random guessing would give 50% correct. For transition probabilities of 0.2 and 0.1, the HMM scores 0.72, 6 points higher than random. With transition probabilities of 0.2 and 0.01, the model score 0.952 correct. At least that is no worse than random!