These were obtained by starting with this model
and then deriving and solving the resulting differential equation. This seems to be a pretty standard treatment.
But in Chapter 11 of Felsenstein, I found an alternative approach based on the Poisson distribution, which I think is quite wonderful.
To begin, he defines rates in terms of u (= 3*λ), so the individual rates for changes from X to Y are u/3. He says:
The easiest way to compute this is to slightly fictionalize the model. Instead of having a rate u of change to one of the three other bases, let us imagine that we instead have a rate 4/3*u of change to a base randomly drawn from all four possibilities. This will be exactly the same process, as there then works out to be a probability of u/3 of change to each of the other three bases. We have also added a rate u/3 of change from a base to itself, which does not matter.
Now we use the Poisson distribution.
For a branch with time t, the
probability of no events at all at a site, when the number expected to occur is 4/3*u*t, is the zero term of the Poisson distribution…the branch of time t consists then of a vast number of tiny segments of time dt, each having the small probability of 4/3*u*dt of an event.
The probability of at least one event is 1 minus the zero term:
The probability that it is a particular event (say, A to C) is:
The probability of change to any one of the three nucleotides is then: