Tuesday, July 28, 2009

Normal approximation to the binomial

I know that the normal can be used as an approximation to the binomial. I was looking for a derivation of this, and I found it via google in a math forum. Doctor Anthony begins:

Derivation of the Normal distribution from the Binomial distribution

Let a variate take values 0, k, 2k, 3k, ..., nk
with probabilities given by successive terms of
(q + p)^n.

What's with the k? Well, we're eventually going to want non-integer terms. The expansion of (q + p)n is familiar:

qn + nqn-1 p + n(n-1)/2 qn-2 p2 + ...

The ith term of the expansion is C(n,i).

Then the mean m = npk and the variance s^2 = npqk^2

OK. Notice use of the multiplication rule for variance from the other day.


y = probability of occurrence of rk = C(n,r) p^r q^(n-r)

Also let:

y' = probability of occurrence (r+1)k = C(n,r+1)p^(r+1) q^(n-r-1)


y' - y = C(n,r+1)p^(r+1) q^(n-r-1) - C(n,r)p^r q^(n-r)

n!p^r q^(n-r-1)
= ---------------[(n-r)p - (r+1)q]
(r+1)! (n-r)!

Hmm... I know that

  y =   [n! /  r!    (n-r)!]     pr     qn-r
y' = [n! / (r+1)! (n-r-1)!] pr+1 qn-r-1
y' - y = ?

Lucie, you got some factoring to do. Let's deal with q and p first.

The left term has qn-r-1 and the right term has qn-r, so we can factor out
qn-r-1, leaving a factor of q on the right-hand term in the brackets.

Similarly we can factor out pr from both sides leaving a factor of p on the left.

The combination expressions expand as shown above. We can factor out n! from both sides. We can factor out 1/(r+1)! from both sides, if we first multiply top and bottom of the right-hand term by (r+1), leaving (r+1) on the top.

Similarly, we can factor out (n-r)! from both sides, if we first multiply top and bottom of the left-hand term by (n-r), leaving (n-r) behind on the top. So everything checks out so far. Next, he wants to divide by y:


y' - y 1 1
------ = ------[np - r(p+q) - q] = ------[np - r - q]
y (r+1)q (r+1)q

(Equation 1)

Hmm...again. We're dividing the expression we had above by y.

  y =   [n! /  r! (n-r)!]        pr     qn-r

We have:

             n!pr qn-r-1
y' - y = ---------------[(n-r)p - (r+1)q]
(r+1)! (n-r)!

So both n! and (n-r)! terms cancel. We also cancel r!, leaving a factor of (r+1) on the bottom. The pr cancels, and the qn-r also cancels leaves a factor of q on the bottom. So I get:

   y' - y      1
------ = ------[(n-r)p - (r+1)q]
y (r+1)q

Now we have to figure out how to rearrange the term in brackets:

  [(n-r)p - (r+1)q]

Expand, and then substitute for p + q = 1:

  np - rp - rq - q
np - r(p+q) - q
np - r - q

It checks out. Doctor Anthony continues:


x = rk - npk, so that x is now the variate measured
from the mean.


r = x/k + np and r+1 = x/k + np + 1


k(r+1) = x + k + npk

k^2 (r+1)q = (x + k + npk)qk

So far so good.

Multiply top and bottom of the righthand side of Equation 1 by k^2. 

y' - y [(np-r)k - qk]k
------ = --------------- [note that (np-r)k = -x]
y [x + k + npk]qk

Go back to what we had, and then multiply top and bottom by k2:

   y' - y    [np - r - q] k^2
------ = ----------------
y (r+1)q k^2

Hmm... The top is fine, but on the bottom we had

(r+1) q k2

We need to get to:

[x + k + npk]qk

He says:

[note that (np-r)k = -x]

OK, so we have:

(r+1) q k2
(rk + k) q k

(np - r) k = -x
rk = npk + x

(x + k + npk) q k

Moving on to substitute for (np-r) k = -x on top and multiplying out on the bottom yields:

              (-x - kq)k
= ----------------
npqk2 + (x+k)qk

Finally, we now let k = dx, so that y' - y = dy and 
let n ->infinity in such a way that nk^2 is finite.
Equation 2 can then be written as:

dy (-x - q dx)dx
---- = ----------------
y s^2 + (x+dx)q dx

The only tricky part here was that we've replaced npqk2 by s2.
Now he says:

As dx -> 0 this becomes:

dy -x dx
---- = ------
y s^2

And we're there! If we integrate the left side we get ln(y), and the right side is
-x2 / 2s2

y = A exp { -x2 / 2s2 }

No comments: