Saturday, May 10, 2008

Fun with the Genetic Code

One of the first projects for a beginning Python Bioinformatics coder is to construct a dictionary holding the genetic code and use it to translate genes. The other day I posted code for a module defining a class that constructs a Python dictionary holding the Genetic Code using list comprehension. The same class makes a "reverse" codon dictionary where the keys are amino acids and the values are lists of synonomous codons. It also makes a third dictionary where the keys are codons and the values are lists of synonomous codons, but without the codon that is the key.

Let's exercise the reverse dictionary:

import random, GeneticCode2
GC = GeneticCode2.GeneticCode()

D = GC.D
rD = GC.rD

def singleChanges(codon):
    L = list()
    nts = 'ACGT'
    for i,wt in enumerate(codon):
        for mut in nts:
            if wt == mut: continue
            L.append(codon[:i] + mut + codon[i+1:])
    return L

R = random.Random(137)
for i in range(5):
    codon = R.choice(D.keys())
    print codon
    for syn in singleChanges(codon):
        print syn,
    print

ATC
CTC GTC TTC AAC ACC AGC ATA ATG ATT
TAT
AAT CAT GAT TCT TGT TTT TAA TAC TAG
CGC
AGC GGC TGC CAC CCC CTC CGA CGG CGT
AAT
CAT GAT TAT ACT AGT ATT AAA AAC AAG
AGT
CGT GGT TGT AAT ACT ATT AGA AGC AGG

aaL = set(D.values())
#aaL.remove('*')
aminoacids = ''.join(aaL)
print aminoacids

for aa in aminoacids[:2]:
    print aa
    codons = rD[aa]
    for codon in codons:
        print codon,
        for c in singleChanges(codon):
            print D[c],
        print
print '*'
print 'TAG',
for c in singleChanges('TAG'):
    print D[c],
print

ACEDGFIHKMLNQPSRTWVY
A
GCA T P S E G V A A A
GCC T P S D G V A A A
GCG T P S E G V A A A
GCT T P S D G V A A A
C
TGT S R G Y S F * C W
TGC S R G Y S F * W C
*
TAG K Q E S W L * Y Y

By mutations altering only a single nucleotide, alanine can be replaced by only a limited set of amino acids: T, P, S, E, D , G or V. That's not so surprising, since there are only nine related codons in total. There are seven amino acids which can change to an amber codon by a single base substitution. It might be interesting to explore further and ask whether the code is structured so that amino acids whose codons are related through single base changes are more similar than randomly chosen pairs.