Friday, December 10, 2010

Tracking evolution of coding sequences 2

I grabbed some sequences (based on the figure in Page & Holmes): ATP synthase alpha subunit, the medium wavelength opsin, terminal transferase, beta globin, and IL6. I got both the human and mouse CDS for each. I did it by hand, and include the files for each in the zip. I used MUSCLE to do the alignment, and then fixed two obvious problems with the IL6 alignment manually.

Finally, I ran the script evaluate.py which does what we said we were going to do last time. We count non-synonymous sites, and synonomous and non-synonomous differences between each pair of sequences.

There isn't a lot to say about how it works. Except of course that the whole point of the exercise is to notice the huge difference in non-synonymous changes allowed in IL6, say, compared to the others.

Here are the results:

gene             syn       non       non/syn
ATP_synthase 0.418 0.013 0.031
opsin 0.35 0.066 0.189
TdT 0.539 0.122 0.226
beta_globin 0.379 0.142 0.375
IL6 0.683 0.519 0.76