Python for Bioinformatics: MCMC: a basic Python simulation

Sunday, December 5, 2010

MCMC: a basic Python simulation

I'm going to continue with the discussion of phylogenetic methods. I want to try to understand MCMC, and I'll post more about the actual application of it later. For today, we have a demonstration using a Python simulation, showing that the samples from a simple MCMC are appropriately distributed.

In the first part of the code we generate a set of 9 models with different "likelihoods": 0.1, 0.2 .. 0.9. These are assigned names from the letters 'A' through 'I'. For interest, we make the assignments randomly, and then sort the letters by the assigned likelihood for the model each represents.

In the second part, we implement the logic of MCMC. It's a Markov chain, starting at model 'A'. At each step we generate a proposal (here by random choice), and we compare the likelihood q of the proposal to the likelihood p of the current model. If q > p, we move. If q < p, we still move but only with probability q/p, otherwise we re-sample the current model.

[ UPDATE: I left something important out of the first version of this simulation. The proposals are generated by random choice, but only choosing among "nearby" models. For this version, a given model only generates a proposal of one immediately next to it: e.g. 'C' can return either 'B' or 'D'. It makes no difference to the result, so I've used the sample graphic.]

The Counter class from here is used to tally the results, and plot a histogram using matplotlib. The models are sorted by their likelihoods to make it easy to see what has happened.

The models had these likelihoods:

B 0.1
C 0.2
F 0.3
H 0.4
E 0.5
D 0.6
G 0.7
I 0.8
A 0.9

They match the histogram. It's that simple.

from __future__ import division
import random
import numpy as np
import matplotlib.pyplot as plt
import Counter as Counter
random.seed(153)

class Model:
    def __init__(self,s,n):
        self.n = n
        self.s = s
    def __repr__(self):
        return self.s

R = [(n+1)*0.1 for n in range(9)]
random.shuffle(R)
D = dict()
letters = list('ABCDEFGHI')
for s in letters:
    D[s] = Model(s,R.pop())
letters = sorted(letters,key=lambda c: D[c].n)
for s in letters:
    print s, D[s].n
#-----------------------------------
M = len(letters)-1
mD = dict()
mD[0] = (1,M)
mD[M] = (0,M-1)
for i in range(1,M):
    mD[i] = (i-1,i+1)
    
def get_proposal(model):
    i = letters.index(str(model))
    j = random.choice(mD[i])
    return D[letters[j]]
#-----------------------------------
rL = ['A']
N = 100000
for i in range(N):
    current = D[rL[-1]]
    p = current.n
    #next = D[random.choice(letters)]
    next = get_proposal(current)
    q = next.n
    if q > p:
        rL.append(str(next))
    elif q/p > random.random():
        rL.append(str(next))
    else:
        rL.append(str(current))
#-----------------------------------
C = Counter.Counter(rL)
X = np.array(range(len(letters)))
Y = [C[c] for c in letters]
plt.xticks(X + 0.4,list(letters))
plt.bar(X,Y)
plt.savefig('example.png')