I came across a nice introduction to probability that starts from sets (here). It contains a small section extending the binomial distribution to the Poisson, which explains the simplifying assumptions in an intuitive way. I'm going to do something similar here, except that I'm following the derivation in my favorite bioinformatics textbook, Higgs & Attwood.

This will complement a previous post, which explored the approximation of the binomial as the normal distribution.

Say we're carrying out a series of Bernoulli trials, flipping a (possibly unfair) coin, or just throwing a loaded die where the possible results are divided into two classes: success and failure. If the probability of success is the same on any trial, denoted

`p`

(and that of failure is `q = 1-p`

), then the probability of a particular sequence like `SFSSSFSFSS`

containing `k`

successes and `n-k`

failures in a total of `n`

trials isWe can gather all the like terms together to give

We can obtain the total probability for all trials having the same number of successes. Multiply the previous result by the number of combinations, the number of ways of picking

`k`

successes out of `n`

trials, which I will symbolize as _{n}C

_{k}for convenience. This is called "n choose k."

The complete expression for the binomial distribution is:

The Poisson approximation applies to cases where

`p`

is small and `n`

is large enough that `p/n`

, the mean number of successes, stays reasonable (like 1 or 2), even as we make `n`

large and `p`

shrinks closer and closer to zero. We will (eventually, below) use the symbol `λ`

for that ratio.But start by looking at the term due to combinations. By our assumptions

`n`

is large and `p`

is small, so that `k`

is reasonable for the cases we are interested in (near the mean and smaller). Suppose `k = 2`

. Then:So, generalizing to

`k`

, and also remembering the factor of `1/k!`

from the original expression, the complete term for the combinations is approximatelyThe second simplification comes from the probability term for failures. Separating the exponents:

Since

`n`

is large and `p`

is small:(Note: the first approximation will need some justification, that I'll defer to a later post. Also, it's not clear to me why we can ignore p in the second case but not in the first).

From these two approximations we obtain:

Substitute

`λ = np`

to reveal the familiar Poisson distribution:One interesting property of the Poisson distribution is that the variance is equal to the mean, so that the curves spread out as

`λ`

increases.R code: