Sunday, December 6, 2009

PyCogent 7: Muscle

More on PyCogent here. The objective for this time is to use the Application Controller for Muscle, and construct a phylogenetic tree.

As with the previous example using clustal, it's important to be sure the the binary is on my path. I did this from the Desktop:

ln -s /Users/te/bin/muscle3.6_src/muscle ~/bin/muscle
export PATH=/Users/te/bin:$PATH

Now, we expect to run muscle as we did clustal, but there is a slight problem:

from import Muscle

def test1():
app = Muscle()
result = app('opuntia.fasta')
print result
for k in ['StdErr','StdOut']:
print k
print result[k].read()

{'ExitStatus': 0, 'StdErr': , 'StdOut': }

MUSCLE v3.6 by Robert C. Edgar
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.

opuntia 7 seqs, max length 902, avg length 896
00:00:00 10 MB(2%) Iter 1 100.00% K-mer dist pass 1
00:00:00 10 MB(2%) Iter 1 100.00% K-mer dist pass 2
00:00:01 10 MB(2%) Iter 1 100.00% Align node
00:00:01 10 MB(2%) Iter 1 100.00% Root alignment
00:00:01 10 MB(2%) Iter 2 100.00% Refine tree
00:00:01 10 MB(2%) Iter 2 100.00% Root alignment
00:00:01 10 MB(2%) Iter 2 100.00% Root alignment
00:00:02 10 MB(2%) Iter 3 100.00% Refine biparts
00:00:03 10 MB(2%) Iter 4 100.00% Refine biparts


The result of the app() call is supposed to be a dictionary-like object with a key 'Align' but that key is not present. Of the two file-like objects that are present, 'StdErr' shows that the program ran, but there aren't any results. This is probably an error, since we're supposed to have a "consistent API."

To work around this issue, we go to an example in the online docs, and try the following instead. I've changed the sequence file to test.fasta, which contains five bacterial 16S rRNA sequences.

from cogent import LoadSeqs, DNA
from import align_unaligned_seqs as malign

def test2():
seqs = LoadSeqs(filename='test.fasta',
#print seqs
aln = malign(seqs,DNA)
#print aln

The file ends up where it should be, with the first two lines:


The next step is to make a tree.

from cogent.evolve.models import GTR
from cogent.phylo import distance, nj

def test3():
seqs = LoadSeqs(filename='test.aln.fasta',
model = GTR()
dcalc = distance.EstimateDistances(seqs,model)
d = dcalc.getPairwiseDistances()
tree = nj.nj(d)
print tree.asciiArt()

Doing [1/10]: Actinomyces_naeslundii <-> Achromobacter_xylosoxidans
Doing [2/10]: Actinomyces_naeslundii <-> Abiotrophia_defectiva
Doing [3/10]: Actinomyces_naeslundii <-> Aeromonas_hydrophila
Doing [4/10]: Actinomyces_naeslundii <-> Acinetobacter_haemolyticus
Doing [5/10]: Achromobacter_xylosoxidans <-> Abiotrophia_defectiva
Doing [6/10]: Achromobacter_xylosoxidans <-> Aeromonas_hydrophila
Doing [7/10]: Achromobacter_xylosoxidans <-> Acinetobacter_haemolyticus
Doing [8/10]: Abiotrophia_defectiva <-> Aeromonas_hydrophila
Doing [9/10]: Abiotrophia_defectiva <-> Acinetobacter_haemolyticus
Doing [10/10]: Aeromonas_hydrophila <-> Acinetobacter_haemolyticus
| \-Abiotrophia_defectiva
-root----| /-Aeromonas_hydrophila
| \-Acinetobacter_haemolyticus

For the last part, I downloaded the code from the supplementary files for the PyCogent paper. I won't show the code because, unfortunately, it doesn't work.

    from matplotlib.path import Path
ImportError: No module named path

Although the paper says it uses ReportLab, and that is what I installed the other day, the code is looking for matplotlib, which I don't have. But still, we're making progress.