Q is a 4 x 4 matrix with the instantaneous rates for each nucleotide to change to another, or stay the same.
Everybody has a different order for the nucleotides! I usually use ACGT (because then I don't have to think twice about it), wikipedia uses AGCT some places and something else other places. Yang uses TCAG, and we'll stick with that since we're borrowing his whole exposition.
Usually, the columns of Q represent the instantaneous rate of going to a particular state, so row 1 is T => T (TT), TC TA and TG. There are exceptions (wikipedia some places, Dayhoff), but this is pretty standard.
I'm going to start with a more advanced model, called TN93 (after Tamura & Nei 1993 PMID 8336541). For TN93 we have 3 exchangeability parameters (α1, α2, β): these describe the rate at which either of the two transitions, or all transversions, occur.
The model is time-reversible, so the exchangeability parameter for MN applies also to NM.
The equilibrium nucleotide frequencies are πN. These are the frequencies to which a sequence would tend by many sequential applications of P. So here is Q. Notice the symmetry:
For example, the instantaneous rate of change from T to C is Q12, which is α1πC. And that's counterintuitive: it involves πC, even though that's where we're going.
There are two other rules. First, the nucleotide frequencies must sum to one:
Σ πi = 1
. Also the sum of rates in each row equals zero, so we calculate the diagonals accordingly.Now, how do you get from Q to P? Here is the equation, which looks simple but definitely is not:
But we can avoid using it for the moment. Yang gives an analytical solution (i.e. which is explicit in terms of the above parameters) for TN93. That's probably why he chose this example, because it's complex enough to be realistic but also has this kind of solution.
The eigenvalues (remember those?) are:
He defines e values:
Then come the 16 terms for the P matrix..
Naturally, I wrote a simulation in Python. It chooses
α1, α2, and β = 2, 1, 0.2
. I have no idea yet if those are reasonable values.But what you can see from the output is that, if we crank the P matrix a very long time, the nucleotide frequencies approach the values in the π vector, as they should.