## Tuesday, March 30, 2010

### Jukes-Cantor (8) - using the Poisson distribution

In a previous post, I showed a derivation of the equations (from Higgs & Attwood) describing the probability of change at a sequence position as a function of time in the Jukes-Cantor model.

 `PXX(t) = 1/4 + 3/4*e-4*α*tPXY(t) = 1/4 - 1/4*e-4*α*t`

These were obtained by starting with this model

 ` t ΔtA => [A,C,G,T] => A`

and then deriving and solving the resulting differential equation. This seems to be a pretty standard treatment.

But in Chapter 11 of Felsenstein, I found an alternative approach based on the Poisson distribution, which I think is quite wonderful.

To begin, he defines rates in terms of u (= 3*λ), so the individual rates for changes from X to Y are u/3. He says:

The easiest way to compute this is to slightly fictionalize the model. Instead of having a rate u of change to one of the three other bases, let us imagine that we instead have a rate 4/3*u of change to a base randomly drawn from all four possibilities. This will be exactly the same process, as there then works out to be a probability of u/3 of change to each of the other three bases. We have also added a rate u/3 of change from a base to itself, which does not matter.

Now we use the Poisson distribution.

 `P(k) = λk / k! * e-λP(0) = e-λ`

For a branch with time t, the

probability of no events at all at a site, when the number expected to occur is 4/3*u*t, is the zero term of the Poisson distribution…the branch of time t consists then of a vast number of tiny segments of time dt, each having the small probability of 4/3*u*dt of an event.

The probability of at least one event is 1 minus the zero term:

 `1 - e-4/3*u*t )`

The probability that it is a particular event (say, A to C) is:

 `1/4*(1 - e-4/3*u*t )`

The probability of change to any one of the three nucleotides is then:

 `3/4*(1 - e-4/3*u*t )`