## Saturday, April 10, 2010

### Chi-squared revisited

I've posted a few times about the chi-square (or -squared) distribution, for example its use to test the distribution of grades between males and females as described in Grinstead & Snell (here). I was also interested more generally in situations where this distribution comes up, for example, when multiplying two random variables that are normally distributed (here).

Probably the simplest example is of rolling dice. If an n-sided die is rolled many times, then the distribution of the counts for each of the n-sides should take on a chi-squared distribution (with df = n-1). I decided to simulate that in Python, with the results as shown in the graphic above. It looks pretty good to me.

Here is the code. Again, I use the `Counter` class from here.

 `import random, mathfrom random import choiceimport numpy as npimport matplotlib.pyplot as pltimport Counterrandom.seed(157)class OneDie: def __init__(self,nsides=6): self.nsides = 6 self.R = range(1,nsides+1) def nrolls(self,N=100): return [choice(self.R) for i in range(N)] def chisquare_stat(N,D): stat = 0 e = 1.0 * N / len(D.keys()) for k in D: o = D[k] stat += (o-e)**2/e return statn = 6d = OneDie(n)L = list()N = 100 for i in range(10000): C = Counter.Counter(d.nrolls(N)) s = chisquare_stat(N,C) L.append(s)# fractional bins for the histogramwidth = 0.4L = [np.floor(s/width) * width for s in L]C = Counter.Counter(L)ax = plt.axes()maxCount = C.most_common(1)[0][1]maxValue = np.max(C.keys())ax.set_xlim(-1,maxValue+1)ax.set_ylim(0,maxCount+5)# go ahead and plotfor value in C: x = value r = plt.Rectangle((x,0),width=width, height=C[value], facecolor='green') ax.add_patch(r)# chi-square should approximate gamma@np.vectorize def gamma_pdf(x,k=5): return x**(k/2.0-1) * math.e**(-x/2.0)X = np.arange(0,maxValue,0.01)Y = gamma_pdf(X,n-1)plt.plot(X,600*Y,'--', color='r',lw=6)plt.savefig('example.png')`