## Tuesday, July 28, 2009

### Normal approximation to the binomial

I know that the normal can be used as an approximation to the binomial. I was looking for a derivation of this, and I found it via google in a math forum. Doctor Anthony begins:

 `Derivation of the Normal distribution from the Binomial distribution---------------------------------------------------------------------Let a variate take values 0, k, 2k, 3k, ..., nk with probabilities given by successive terms of(q + p)^n.`

What's with the k? Well, we're eventually going to want non-integer terms. The expansion of (q + p)n is familiar:

 `qn + nqn-1 p + n(n-1)/2 qn-2 p2 + ...`

The ith term of the expansion is C(n,i).

 `Then the mean m = npk and the variance s^2 = npqk^2`

OK. Notice use of the multiplication rule for variance from the other day.

 `Suppose: y = probability of occurrence of rk = C(n,r) p^r q^(n-r)Also let: y' = probability of occurrence (r+1)k = C(n,r+1)p^(r+1) q^(n-r-1)Then: y' - y = C(n,r+1)p^(r+1) q^(n-r-1) - C(n,r)p^r q^(n-r) n!p^r q^(n-r-1) = ---------------[(n-r)p - (r+1)q] (r+1)! (n-r)! `

Hmm... I know that

 ` y = [n! / r! (n-r)!] pr qn-r y' = [n! / (r+1)! (n-r-1)!] pr+1 qn-r-1 y' - y = ?`

Lucie, you got some factoring to do. Let's deal with q and p first.

The left term has qn-r-1 and the right term has qn-r, so we can factor out
qn-r-1, leaving a factor of q on the right-hand term in the brackets.

Similarly we can factor out pr from both sides leaving a factor of p on the left.

The combination expressions expand as shown above. We can factor out n! from both sides. We can factor out 1/(r+1)! from both sides, if we first multiply top and bottom of the right-hand term by (r+1), leaving (r+1) on the top.

Similarly, we can factor out (n-r)! from both sides, if we first multiply top and bottom of the left-hand term by (n-r), leaving (n-r) behind on the top. So everything checks out so far. Next, he wants to divide by y:

 `And: y' - y 1 1 ------ = ------[np - r(p+q) - q] = ------[np - r - q] y (r+1)q (r+1)q(Equation 1)`

Hmm...again. We're dividing the expression we had above by y.

 ` y = [n! / r! (n-r)!] pr qn-r`

We have:

 ` n!pr qn-r-1 y' - y = ---------------[(n-r)p - (r+1)q] (r+1)! (n-r)!`

So both n! and (n-r)! terms cancel. We also cancel r!, leaving a factor of (r+1) on the bottom. The pr cancels, and the qn-r also cancels leaves a factor of q on the bottom. So I get:

 ` y' - y 1 ------ = ------[(n-r)p - (r+1)q] y (r+1)q`

Now we have to figure out how to rearrange the term in brackets:

 ` [(n-r)p - (r+1)q]`

Expand, and then substitute for p + q = 1:

 ` np - rp - rq - q np - r(p+q) - q np - r - q`

It checks out. Doctor Anthony continues:

 `Let: x = rk - npk, so that x is now the variate measured from the mean.Then: r = x/k + np and r+1 = x/k + np + 1Thus: k(r+1) = x + k + npk k^2 (r+1)q = (x + k + npk)qk`

So far so good.

 `Multiply top and bottom of the righthand side of Equation 1 by k^2. Then: y' - y [(np-r)k - qk]k ------ = --------------- [note that (np-r)k = -x] y [x + k + npk]qk`

Go back to what we had, and then multiply top and bottom by k2:

 ` y' - y [np - r - q] k^2 ------ = ---------------- y (r+1)q k^2`

Hmm... The top is fine, but on the bottom we had

 `(r+1) q k2`

We need to get to:

 `[x + k + npk]qk`

He says:

 `[note that (np-r)k = -x]`

OK, so we have:

 `(r+1) q k2(rk + k) q kSince:(np - r) k = -xrk = npk + xSubstituting:(x + k + npk) q k`

Moving on to substitute for (np-r) k = -x on top and multiplying out on the bottom yields:

 ` (-x - kq)k = ---------------- npqk2 + (x+k)qk`

 `Finally, we now let k = dx, so that y' - y = dy and let n ->infinity in such a way that nk^2 is finite. Equation 2 can then be written as: dy (-x - q dx)dx ---- = ---------------- y s^2 + (x+dx)q dx`

The only tricky part here was that we've replaced npqk2 by s2.
Now he says:

 `As dx -> 0 this becomes: dy -x dx ---- = ------ y s^2`

And we're there! If we integrate the left side we get ln(y), and the right side is
-x2 / 2s2

y = A exp { -x2 / 2s2 }