Saturday, December 5, 2009

PyCogent 3: Translation

I'm continuing with an exploration of PyCogent (first post here). I found a draft cookbook on the web. That will be a big help!

The problem that I had in following the first tutorial is this: the function LoadSeqs creates an object of this class:

<class 'cogent.core.alignment.SequenceCollection'>


What we want is a single DNA sequence. We can get the individual sequences back from the collection like so:

from cogent import LoadSeqs, DNA
fn = 'SThemA.fasta'
s = LoadSeqs(fn,moltype=DNA,aligned=False)
name,seq = s.items()[0]


seq is now a

<class 'cogent.core.sequence.DnaSequence'>


and it supports slicing. We can also construct a similar object directly:

seq2 = DNA.makeSequence("AGTACACTGGT")
seq2.Name = 'X'
print seq2.toFasta()


>X
AGTACACTGGT


We still can't translate a stop codon:

pep = seq.getTranslation()


cogent.core.alphabet.AlphabetError: TAG


But slicing makes it easy to fix that:

pep = seq[:-3].getTranslation()
print type(pep)
print pep[:24]


<class 'cogent.core.sequence.ProteinSequence'>
MTLLALGINHKTAPVSLRERVTFS


Sequences are complex objects:

L = dir(seq)
L = [e for e in L if not e[0] == '_']
for e in L[:10]: print e
print len(L), 'items'


CodonAlphabet
Info
LineWrap
MW
MolType
Name
PROTEIN
addAnnotation
addFeature
annotateFromGff
81 items


Going through the dir reveals the intended method for translating with a terminal stop:


seq = DNA.makeSequence("ATGACACTGGTGTAG")
seq.Name = 'X'
print seq.toFasta()
print seq.hasTerminalStop()
seq2 = seq.withoutTerminalStopCodon()
print seq2.getTranslation()
# reverse complement
print seq.rc().canPair(seq)


>X
ATGACACTGGTGTAG
True
MTLV
True