Wednesday, December 9, 2009

PyCogent 10: basic DNA methods

I'm exploring PyCogent (docs, download page)---first post here.

We'll use this method from the last post (and you'll also need to load the data from your favorite FASTA-formatted sequence file), as shown in that example.


from cogent import Sequence, DNA

# method 1: from a string
seq = Sequence(moltype=DNA,
name=n,
seq=s)
print '1', seq[:24]

print type(seq)
print seq[:36].toFasta()


<class 'cogent.core.sequence.DnaSequence'>
>SThemA
ATGACCCTTTTAGCGCTCGGTATTAACCATAAAACG


print seq.Name
print seq[:36]
print seq[:36].complement()
print seq.MW() # this strand only


SThemA
ATGACCCTTTTAGCGCTCGGTATTAACCATAAAACG
TACTGGGAAAATCGCGAGCCATAATTGGTATTTTGC
365443.89


print seq[:36].rc()  # reverse complement
print seq.canPair(seq.rc())


CGTTTTATGGTTAATACCGAGCGCTAAAAGGGTCAT
True


print seq[-36:].toRna()
print seq.hasTerminalStop()


CUGAAUAUUCUGCGCGACAGCCUCGGGCUGGAGUAG
True


s = seq.withoutTerminalStopCodon()
print s[-33:]
print s.getTranslation()[-11:]
print s.getOrfPositions()


CTGAATATTCTGCGCGACAGCCTCGGGCTGGAG
LNILRDSLGLE
[(0, 1254)]


s = seq.toFasta().split('\n',1)[1]
print s[:36]
print type(s)


ATGACCCTTTTAGCGCTCGGTATTAACCATAAAACG
<type 'str'>


mseq = seq.shuffle()
print seq[:36]
print mseq[:36]
print seq.count('g') == mseq.count('g')
print all([seq.count(n) == mseq.count(n) for n in 'acgt'])


ATGACCCTTTTAGCGCTCGGTATTAACCATAAAACG
GGTCCAATGTCCCAGCGACTGAGCTGTGAACGTTAA
True
True