Sunday, December 6, 2009

PyCogent 7: Muscle

More on PyCogent here. The objective for this time is to use the Application Controller for Muscle, and construct a phylogenetic tree.

As with the previous example using clustal, it's important to be sure the the binary is on my path. I did this from the Desktop:

ln -s /Users/te/bin/muscle3.6_src/muscle ~/bin/muscle
export PATH=/Users/te/bin:$PATH


Now, we expect to run muscle as we did clustal, but there is a slight problem:

from cogent.app.muscle import Muscle

def test1():
app = Muscle()
result = app('opuntia.fasta')
print result
for k in ['StdErr','StdOut']:
print k
print result[k].read()
test1()


{'ExitStatus': 0, 'StdErr': , 'StdOut': }
StdErr

MUSCLE v3.6 by Robert C. Edgar

http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.

opuntia 7 seqs, max length 902, avg length 896
00:00:00 10 MB(2%) Iter 1 100.00% K-mer dist pass 1
00:00:00 10 MB(2%) Iter 1 100.00% K-mer dist pass 2
00:00:01 10 MB(2%) Iter 1 100.00% Align node
00:00:01 10 MB(2%) Iter 1 100.00% Root alignment
00:00:01 10 MB(2%) Iter 2 100.00% Refine tree
00:00:01 10 MB(2%) Iter 2 100.00% Root alignment
00:00:01 10 MB(2%) Iter 2 100.00% Root alignment
00:00:02 10 MB(2%) Iter 3 100.00% Refine biparts
00:00:03 10 MB(2%) Iter 4 100.00% Refine biparts

StdOut


The result of the app() call is supposed to be a dictionary-like object with a key 'Align' but that key is not present. Of the two file-like objects that are present, 'StdErr' shows that the program ran, but there aren't any results. This is probably an error, since we're supposed to have a "consistent API."

To work around this issue, we go to an example in the online docs, and try the following instead. I've changed the sequence file to test.fasta, which contains five bacterial 16S rRNA sequences.

from cogent import LoadSeqs, DNA
from cogent.app.muscle import align_unaligned_seqs as malign

def test2():
seqs = LoadSeqs(filename='test.fasta',
aligned=False)
#print seqs
aln = malign(seqs,DNA)
#print aln
aln.writeToFile('test.aln.fasta')
test2()

The file ends up where it should be, with the first two lines:

>Actinomyces_naeslundii
----------TTGATCCTGGCTCAGGACGAACGCTGGCGGCGTGCTTAACACATGCAAGT

The next step is to make a tree.

from cogent.evolve.models import GTR
from cogent.phylo import distance, nj

def test3():
seqs = LoadSeqs(filename='test.aln.fasta',
aligned=True)
model = GTR()
dcalc = distance.EstimateDistances(seqs,model)
dcalc.run(show_progress=True)
d = dcalc.getPairwiseDistances()
tree = nj.nj(d)
print tree.asciiArt()
tree.writeToFile('test.tree')
test3()


Doing [1/10]: Actinomyces_naeslundii <-> Achromobacter_xylosoxidans
Doing [2/10]: Actinomyces_naeslundii <-> Abiotrophia_defectiva
Doing [3/10]: Actinomyces_naeslundii <-> Aeromonas_hydrophila
Doing [4/10]: Actinomyces_naeslundii <-> Acinetobacter_haemolyticus
Doing [5/10]: Achromobacter_xylosoxidans <-> Abiotrophia_defectiva
Doing [6/10]: Achromobacter_xylosoxidans <-> Aeromonas_hydrophila
Doing [7/10]: Achromobacter_xylosoxidans <-> Acinetobacter_haemolyticus
Doing [8/10]: Abiotrophia_defectiva <-> Aeromonas_hydrophila
Doing [9/10]: Abiotrophia_defectiva <-> Acinetobacter_haemolyticus
Doing [10/10]: Aeromonas_hydrophila <-> Acinetobacter_haemolyticus
/-Actinomyces_naeslundii
/edge.0--|
| \-Abiotrophia_defectiva
|
-root----| /-Aeromonas_hydrophila
|-edge.1--|
| \-Acinetobacter_haemolyticus
|
\-Achromobacter_xylosoxidans

For the last part, I downloaded the code from the supplementary files for the PyCogent paper. I won't show the code because, unfortunately, it doesn't work.

    from matplotlib.path import Path
ImportError: No module named path

Although the paper says it uses ReportLab, and that is what I installed the other day, the code is looking for matplotlib, which I don't have. But still, we're making progress.