Friday, February 26, 2010

Jukes-Cantor (2)


The figure illustrates at least part of the reason that we need models of sequence evolution. It comes from a very nice book by Page & Holmes.

What I want to do here is to follow the derivation of the equations for PXX and PXY as a function of time, as developed in Higgs & Atwood. This isn't really necessary from a mathematical viewpoint, since we have already guessed the equations, but it's a fun argument.

Consider the following path: we start with an A at some position at time-zero, and after time t + Δt we observe that it is still A, but realize that at a short time prior to the second observation it might have been any nucleotide (since we weren't looking then):

  t            Δt
A => [A,C,G,T] => A


There are four possible paths to get from A to A, through each of the possible intermediates. We sum over the probabilities... We have:

PAA(t + Δt) = α*Δt*(PAC(t))
+ α*Δt*(PAG(t))
+ α*Δt*(PAT(t))
+ (1 - 3*α*Δt)*(PAA(t))


We can expand the last term to:

PAA(t) - 3*α*Δt*PAA(t)


A wee bit o'calculus. We want to know PAA(t + Δt). Since Δt is small, we can take the value of the function at t and correct it by adding the slope of the function (at t) times Δt. That is:

PAA(t + Δt) = PAA(t) + d/dt PAA(t) * Δt


So we substitute this expression for the left-hand side of the first equation and then notice that we can subtract PAA(t) from both sides, leaving:

d/dt PAA(t) * Δt = α*Δt*(PAC(t) + PAG(t) + PAT(t)) - 3*α*ΔtPAA(t)

Since Δt occurs in each term on both sides it cancels (which is really the whole point of this). Also the sum of the three PAX(t) terms is equal to 1 - PAA(t), and so we have:

d/dt PAA(t) = α*(1 - PAA(t)) - 3*α*PAA(t)
= α - 4*αPAA(t)


The rate of change of PAA(t) is proportional to PAA(t), which is pretty obvious when you think about it, and so the form of the equation is an exponential:


PAA(t) = A*e-4*α*t + B


We need the -4α in the exponent, so that it will come out front when we take the derivative (see here).

We evaluate the constants A and B by considering the boundary conditions, namely, at long times PAA(t) = 1/4, so B = 1/4; and PAA(0) = 1, so A + B = 1 and A = 3/4.


PAA(t) = 1/4 + 3/4*e-4*α*t

Since the other three PAX(t) are all equal and also equal to 1 - PAA(t), we have:

3 * PAX(t) = 1 - PAA(t)
= 3/4 - 3/4*e-4*α*t
PAX(t) = 1/4 - 1/4*e-4*α*t

which is just what we said the other day!