Thursday, December 10, 2009

PyCogent 14: FastTree



I'm exploring PyCogent (docs, download page)---first post here.

In browsing the PyCogent docs, I came upon something about an Application Controller for FastTree. Since I didn't know what that was, I checked it out here. It looks very interesting. I downloaded the source and built it with this line (from their documentation):

gcc -lm -O3 -finline-functions -funroll-loops -Wall -o FastTree FastTree.c

Simple enough. Here is a script that exercises it. I start with 20 16S rRNA sequences, some of them are experimentally determined (with prefix DA) and the rest are database sequences. You'll have to substitute your own sequences for this file. We load the sequences, align them with muscle, make a tree with fasttree, and plot the result with matplotlib. It runs in about 15 seconds. That is very fast.

This will probably be my last post about PyCogent for a while. I'm still working on it, but it's going to take time. Also, I got a new machine to play with: a mini running OS X Server.


import sys
from cogent import LoadSeqs,DNA
from cogent.app import fasttree,muscle
from cogent.draw import dendrogram

def get_aln(fn):
f = muscle.align_unaligned_seqs
seqs = LoadSeqs(filename=fn,
aligned=False,
format='fasta')
print seqs.getNumSeqs()
return f(seqs,DNA)

def get_color(node):
if node.isTip():
name = node.getTipNames()[0]
if name[:2] == 'DA': return 'red'
return 'blue'
return 'black'

def get_tree(aln):
f = fasttree.build_tree_from_alignment
return f(aln,moltype=DNA)

def plot_tree(tr):
np = dendrogram.ContemporaneousDendrogram(tr)
np.drawToPDF('temp.pdf',
width=700,
height=max(300,aln.getNumSeqs()*25),
font_size=14,
edge_color_callback=get_color)

fn = 'capno1.fasta'
aln = get_aln(fn)
tr = get_tree(aln)
plot_tree(tr)