Saturday, November 14, 2009

Hidden Markov Models (4)



In the last post I described a Python class that we will use in the future to explore the dynamic programming algorithms that are important with HMMs. But before we continue with the Python, I want to go through an example of the first of these methods, the Viterbi algorithm, which is named for Andrew Viterbi. It is applied in the case where we have settled on a model and its characteristic probabilities (the parameters), and we want to calculate:

• the most likely sequence of states given some observed output

The example I'm going to use is the weather, but with a twist from the previous example of weather by telephone. Here the weather will be the observed data. The underlying hidden states of the model are two: G (a propensity for good weather) and B (bad weather is likely). G and B can both emit either sun or rain, with characteristic emission probabilities. At each step in the chain, the states can also switch with some characteristic transition probability. (Or, stay the same, with P = 1 - P(T)).



Below is a series of observations.



For a series of n observations, in an HMM with 2 states, there are 2**n possible state sequences, each of which has some probability of emitting the observed data. Given how unbalanced the emission probabilities are for our current model, it is pretty obvious that the first state sequence above is more likely than the second one. Now, if the chain is short enough, we can calculate the probabilities for each possible sequence exactly. For example:



We observe Sun-Rain-Sun. We calculate:



GGG 0.90 * 0.60 * 0.10 * 0.60 * 0.90 = 0.02916
GGB 0.90 * 0.60 * 0.10 * 0.40 * 0.20 = 0.00432
GBG 0.90 * 0.40 * 0.80 * 0.30 * 0.90 = 0.07776
GBB 0.90 * 0.40 * 0.80 * 0.70 * 0.20 = 0.04032
BGG 0.20 * 0.30 * 0.10 * 0.60 * 0.90 = 0.00324
BGB 0.20 * 0.30 * 0.10 * 0.40 * 0.20 = 0.00048
BBG 0.20 * 0.70 * 0.80 * 0.30 * 0.90 = 0.03024
BBB 0.20 * 0.70 * 0.80 * 0.70 * 0.20 = 0.01568


So, as one would expect, GBG is most likely. In a real problem, there will be more states, and the probabilities will not be so skewed. That calls for the Viterbi algorithm. I'll show it (for this problem) in the next post.