Python for Bioinformatics: Models of sequence evolution 3

Tuesday, December 7, 2010

Models of sequence evolution 3

I wrote a couple of long posts about models of sequence evolution (here and here), in which (and it may not be obvious) we solved the problem of how to obtain:

P = e^Qt

for any t, by getting eigenvalues and eigenectors, using both R and Python. That's outstanding.

If you look carefully, you might notice something. There are just two terms in Q for the JC69 model:

-0.01 (on diagonal) and 0.003333 (off diagonal)

P has only two terms as well and they turn out to be (for t = 0.1):

0.999 and 0.000333

That is, the off-diagonal terms are apparently just multiplied by t, and naturally the diagonal is 1 - 3 times the other term. There was really no need for all the exponentiation of matrices, etc.

How does this happen? The formulas for the two terms in JC69 are:

P_NN(t) = 1/4 + 3/4 exp(-4λt)
P_NM(t) = 1/4 - 1/4 exp(-4λt)

But going back to the infinite series expansion for e (or just remembering the approximation for small x):

e^x = 1 + x¹/1! + x²/2! + ..
e^x = 1 + x

So the exponential is just 1 - 4λt and we obtain:

P_NN(t) = 1/4 + 3/4 (1 - 4λt)
  = 1 - 3λt
P_NM(t) = 1/4 - 1/4 (1 - 4λt)
  = λt

Let's take another look at the formulas for these terms in the TN93 model. Look at only the first row of P. We first consider the two exponential terms:

e₂ = exp(-βt)
e₄ = exp[-(π_Yα₁ + π_Rβ)t]

Approximating for small t:

e₂ = 1 - βt
e₄ = 1 - (π_Yα₁ + π_Rβ)t

Look what happens to P₁₃ and P₁₄:

P₁₃ = &pi_A(1 - e₂)
    = &pi_Aβt

This is t times Q₁₃. Exactly the same thing happens for P₁₄. P₁₂ is a bit more complicated, but it's just algebra:

P₁₂ = π_C + (π_Cπ_R/π_Y)e₂ - (π_C/π_Y)e₄

The second term is:

π_Cπ_R/π_Y (1 - βt)
π_Cπ_R/π_Y - (π_Cπ_R/π_Y)βt

The third term is:

π_C/π_Y (-1 + π_Yα₁t + π_Rβt)

We can see that (considering both the second and the third term) there are two copies of: (π_Cπ_R/π_Y)βt with opposite sign, leaving:

P₁₂ = π_C + π_Cπ_R/π_Y - π_C/π_Y + π_Cα₁t

The last term is t times Q₁₂. So I'm betting that the rest will go away. We have:

π_C + π_Cπ_R/π_Y - π_C/π_Y

but

π_Y + π_R = 1
π_R - 1 = -π_Y

π_C + π_Cπ_R/π_Y - π_C/π_Y
  = π_C + π_C/π_Y(π_R - 1)
  = π_C - π_C
  = 0

P₁₂ = π_Cα₁t = t Q₁₂

How about that!