## Friday, February 26, 2010

### Jukes-Cantor (4)

If we think of evolutionary changes in sequences in a discrete fashion, then we could pick an α that would give, in one unit of evolutionary time, a 1% chance that nucleotide X => nucleotide Y. We can model this by matrix multiplication. We set up an initial Identity matrix and then process it to give the rate matrix T.

 ` A C G TA 0.97 0.01 0.01 0.01 C 0.01 0.97 0.01 0.01 G 0.01 0.01 0.97 0.01 T 0.01 0.01 0.01 0.97`

A starts out as equal to I, but repeated rounds of T•A yield the results below.

This can easily be modified to change the sequence model, although I haven't shown it here.

However, I notice that there is an inconsistency between this model and the simulation of mutation from here. The simulation showed saturation around 200% mutation, whereas this matrix method is clearly saturated at 100 rounds of 1%. I believe we may be missing a factor of 2, but I don't have any idea how that would be. I will think about it in days to come, but if you can spot the answer please let me know.

[ UPDATE: <whacks self on head> The previous simulation was for 1% mutation in each round, while this one has 3% in each round (1% for each nucleotide). That's the source of the difference! ]

 `0 A C G TA 1.0 0.0 0.0 0.0 C 0.0 1.0 0.0 0.0 G 0.0 0.0 1.0 0.0 T 0.0 0.0 0.0 1.0 1 A C G TA 0.97 0.01 0.01 0.01 C 0.01 0.97 0.01 0.01 G 0.01 0.01 0.97 0.01 T 0.01 0.01 0.01 0.97 2 A C G TA 0.9412 0.0196 0.0196 0.0196 C 0.0196 0.9412 0.0196 0.0196 G 0.0196 0.0196 0.9412 0.0196 T 0.0196 0.0196 0.0196 0.9412`

 `import numpy as npnt = 'ACGT'def show(A,x): print x print ' ' + ' '.join(list(nt)) for i,e in enumerate(A): print nt[i] + ' ', L = [str(round(n,4)).ljust(7) for n in e] print ' '.join(L) print I = np.eye(4)A = IT = I * 0.96 + 0.01show(A,0)for i in range(1,101): A = np.dot(T,A) R = range(2,10,2) + range(20,101,10) if i in [1] + R: show(A,i)`

 `0 A C G TA 1.0 0.0 0.0 0.0 C 0.0 1.0 0.0 0.0 G 0.0 0.0 1.0 0.0 T 0.0 0.0 0.0 1.0 1 A C G TA 0.97 0.01 0.01 0.01 C 0.01 0.97 0.01 0.01 G 0.01 0.01 0.97 0.01 T 0.01 0.01 0.01 0.97 2 A C G TA 0.9412 0.0196 0.0196 0.0196 C 0.0196 0.9412 0.0196 0.0196 G 0.0196 0.0196 0.9412 0.0196 T 0.0196 0.0196 0.0196 0.9412 4 A C G TA 0.887 0.0377 0.0377 0.0377 C 0.0377 0.887 0.0377 0.0377 G 0.0377 0.0377 0.887 0.0377 T 0.0377 0.0377 0.0377 0.887 6 A C G TA 0.8371 0.0543 0.0543 0.0543 C 0.0543 0.8371 0.0543 0.0543 G 0.0543 0.0543 0.8371 0.0543 T 0.0543 0.0543 0.0543 0.8371 8 A C G TA 0.791 0.0697 0.0697 0.0697 C 0.0697 0.791 0.0697 0.0697 G 0.0697 0.0697 0.791 0.0697 T 0.0697 0.0697 0.0697 0.791 20 A C G TA 0.5815 0.1395 0.1395 0.1395 C 0.1395 0.5815 0.1395 0.1395 G 0.1395 0.1395 0.5815 0.1395 T 0.1395 0.1395 0.1395 0.5815 30 A C G TA 0.4704 0.1765 0.1765 0.1765 C 0.1765 0.4704 0.1765 0.1765 G 0.1765 0.1765 0.4704 0.1765 T 0.1765 0.1765 0.1765 0.4704 40 A C G TA 0.3965 0.2012 0.2012 0.2012 C 0.2012 0.3965 0.2012 0.2012 G 0.2012 0.2012 0.3965 0.2012 T 0.2012 0.2012 0.2012 0.3965 50 A C G TA 0.3474 0.2175 0.2175 0.2175 C 0.2175 0.3474 0.2175 0.2175 G 0.2175 0.2175 0.3474 0.2175 T 0.2175 0.2175 0.2175 0.3474 60 A C G TA 0.3148 0.2284 0.2284 0.2284 C 0.2284 0.3148 0.2284 0.2284 G 0.2284 0.2284 0.3148 0.2284 T 0.2284 0.2284 0.2284 0.3148 70 A C G TA 0.2931 0.2356 0.2356 0.2356 C 0.2356 0.2931 0.2356 0.2356 G 0.2356 0.2356 0.2931 0.2356 T 0.2356 0.2356 0.2356 0.2931 80 A C G TA 0.2786 0.2405 0.2405 0.2405 C 0.2405 0.2786 0.2405 0.2405 G 0.2405 0.2405 0.2786 0.2405 T 0.2405 0.2405 0.2405 0.2786 90 A C G TA 0.269 0.2437 0.2437 0.2437 C 0.2437 0.269 0.2437 0.2437 G 0.2437 0.2437 0.269 0.2437 T 0.2437 0.2437 0.2437 0.269 100 A C G TA 0.2627 0.2458 0.2458 0.2458 C 0.2458 0.2627 0.2458 0.2458 G 0.2458 0.2458 0.2627 0.2458 T 0.2458 0.2458 0.2458 0.2627`