Python for Bioinformatics: Poisson simulation

Wednesday, November 3, 2010

Poisson simulation

This is a very simple simulation of the Poisson distribution using Python.

First a function event() is defined which returns either 1 or 0, with 1 occuring at frequency f (here f = 0.1). A list is constructed by calling event() N times.

Then the list is chopped into groups using a method described and discussed previously (here and at Stack Overflow here).

In the example, there are 10 events in each group, so the expectation for the average number of successes is 1. This should generate a Poisson distribution with mean λ = 1.

The Counter class is used to evaluate the results. It is included in Python 2.7 but not in my 2.6 installation. Then we output the results for various k:

The problem I'm having is that the results are not quite correct. We obtain the expected values when n = 9 rather than n = 10. If f is made smaller (say f = 0.01) it looks better, but still the expected distribution is more closely approximated by n = 99 than n = 100.

I'd be grateful for any ideas about what's wrong!

[UPDATE: I think the problem here is the same as I had the other day. The Poisson is an approximation. If we do int(1e7) events in groups of n=1000, it looks as expected. ]

from itertools import izip_longest
import numpy as np
import Counter

def event():
    f = 0.1
    r = np.random.random()
    if r <= f:  return 1
    return 0

N = int(1e5)
L = [event() for i in range(N)]

def groups(iterable, n=3, padvalue=0):
    "groups('abcde', 3, 'x') --> ('a','b','c'), ('d','e','x')"
    return izip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

rL = [sum(g) for g in groups(L,n=10)]
C = Counter.Counter(rL)
for i in range(max(C.keys())+1):
    print str(i).rjust(2), C[i]