Python for Bioinformatics: Jukes-Cantor (5)

Sunday, February 28, 2010

Jukes-Cantor (5)

I am trying to see how the equations in the Jukes-Cantor model of sequence evolution work, and then eventually, extend this to other models. In order to test my understanding, I'll want to work out some practical examples. But I'm not there yet.

What I want to do here is to wrap up something from the first post. There we had two differential equations for the rate of change of a particular nucleotide position:

d/dt(P_XX(t)) =  -3*α*e^-4αt
d/dt(P_XY(t)) =   α*e^-4αt

And we'd like to express these results in terms of P_XX(t) and P_XY(t):

P_XX(t) = 1/4 + 3/4*e^-4αt
P_XY(t) = 1/4 - 1/4*e^-4αt

Taking the first one, we have

P_XX(t) = 1/4 + 3/4*e^-4αt
3*e^-4αt = 4*(P_XX(t) - 1/4)
-3*α*e^-4αt = -4*α*(P_XX(t) - 1/4)
d/dt(P_XX(t)) = α - 4*α*P_XX(t)

And for the second

P_XY(t) = 1/4 - 1/4*e^-4αt
e^-4αt = 1 - 4*P_XY(t))
α*e^-4αt = α - 4*α*P_XY(t))
d/dt(P_XY(t)) = α - 4*α*P_XY(t)

So the slopes are proportional to the probabilities, with an extra term. But the most interesting thing is that the form is the same for both P_XX and P_XY!

I wasn't expecting this but it makes sense, because at long times we come to equilibrium (the stationary distribution of the Markov chain), and all rates are the same. At time-zero we have P_XX = 1 and the rate is -3*α, while P_XY = 0 and the rate is α. I think it's OK.