The figure illustrates at least part of the reason that we need models of sequence evolution. It comes from a very nice book by Page & Holmes.
What I want to do here is to follow the derivation of the equations for PXX and PXY as a function of time, as developed in Higgs & Atwood. This isn't really necessary from a mathematical viewpoint, since we have already guessed the equations, but it's a fun argument.
Consider the following path: we start with an A at some position at time-zero, and after time t + Δt we observe that it is still A, but realize that at a short time prior to the second observation it might have been any nucleotide (since we weren't looking then):
There are four possible paths to get from A to A, through each of the possible intermediates. We sum over the probabilities... We have:
We can expand the last term to:
A wee bit o'calculus. We want to know
PAA(t + Δt)
. Since Δt
is small, we can take the value of the function at t and correct it by adding the slope of the function (at t) times Δt
. That is:So we substitute this expression for the left-hand side of the first equation and then notice that we can subtract PAA(t) from both sides, leaving:
Since
Δt
occurs in each term on both sides it cancels (which is really the whole point of this). Also the sum of the three PAX(t) terms is equal to 1 - PAA(t)
, and so we have:The rate of change of
PAA(t)
is proportional to PAA(t)
, which is pretty obvious when you think about it, and so the form of the equation is an exponential:We need the -4α in the exponent, so that it will come out front when we take the derivative (see here).
We evaluate the constants A and B by considering the boundary conditions, namely, at long times
PAA(t) = 1/4
, so B = 1/4
; and PAA(0) = 1
, so A + B = 1
and A = 3/4
.Since the other three
PAX(t)
are all equal and also equal to 1 - PAA(t)
, we have:which is just what we said the other day!