Thursday, April 1, 2010

A kick in the pants?

I worked up another problem from Grinstead & Snell, but it's going to be the last. I don't want to give away all the answers., although I may post about content from the later chapters as I get it figured out. The problem for this post involves the classic example of the Poisson distribution.

"In one of the first studies of the Poisson distribution, von Bortkiewicz considered the frequency of deaths from kicks in the Prussian army corps. From the study of 14 corps over a 20-year period, he obtained the data shown in [the] Table. Fit a Poisson distribution to this data and see if you think that the Poisson distribution is appropriate."

deaths number of corps with x deaths
0 144
1 91
2 32
3 11
4 2

We can think of the results for each corps over a single year as a Bernoulli trial with n very large and p very small and the mean near 1. We calculate the mean as (91 + 2*32 + 3*11 + 4*2) / 280 = 0.7. So λ = 0.7.

The code below prints the values for P(0), P(1) etc, as well as the calculated probability given this λ, and the expected and observed values for the number of corps with x deaths. The output is:

$ python 
k p E O
0 0.497 139.04 144
1 0.348 97.33 91
2 0.122 34.07 32
3 0.028 7.95 11
4 0.005 1.39 2

I think the model fits the data pretty well. However, I would like to know the appropriate statistical test for the fit.

import math
import matplotlib.pyplot as plt

def poisson(m):
def f(k):
e = math.e**(-m)
f = math.factorial(k)
g = m**k
return g*e/f
return f

p = poisson(0.7)
data = [144,91,32,11,2]
print 'k p E O'
for i in range (5):
print str(i).ljust(3),
prob = p(i)
print str(round(prob,3)).ljust(5),
print str(round(prob*280,2)).rjust(6),
print str(data[i]).rjust(5)


Khang Tsung Fei said...

Hi Tom!

You have an interesting site, and I had fun checking out the contents.

In this example, the chi-squared test is commonly used to test goodness of fit for count data.

telliott99 said...

Hi Khang,

Thanks for reading!

So, I guess we should just count up
all the categories with count > 0,
and then take df = 5-1 = 4?

Khang Tsung Fei said...

Hi Tom!

You can classify the last two categories as 3+, so that its probability under the Poisson model is 0.033. Thus, the expected count is 9.24, with observed count
13. Since the model has 4 categories, the df is 3.

Khang Tsung Fei said...
This comment has been removed by the author.