From last time, we have two equations for sequences changing according to Jukes-Cantor:

We can look at this as the probability that a single site will change in this way (from X to Y) over time, but we can also look at it as the fraction of a collection of sites that will change. Since Y can be any one of three nucleotides, the total fraction of sites that differ between the ancestral sequence and a present-day descendant sequence is three times P

_{XY}(t) or:

Two present-day homologs (common ancestor) have effectively evolved for twice the time because there are two stretches of evolution of time t. The proportion of sites that differ is:

The above equation is what we observe when we look at the sequences. However, our estimate of the true distance, or actual number of substitutions per site:

We usually do not know either α or t individually, but we can say that:

This is what we've been after. These equations relate the actual evolutionary distance to the observed changes and vice-versa.

p = proportion or fraction of sites that are observed to be different

d = distance or actual number of substitutions per site

Their relationship is plotted at the top.

Plot code: