Saturday, December 5, 2009

PyCogent 4: Clustalw

I'm continuing with an exploration of PyCogent (first post here). The goal for this post is to run clustalw using an "Application Controller" from PyCogent. The example from the cookbook uses an input file from Biopython (download link), opuntia.fasta, which is on my Desktop.

I ran into two issues that were solved pretty easily. The first is that I don't usually manipulate the $PATH variable from the shell, but rather specify needed paths explicitly or add them to sys.path from within Python. However, PyCogent does not seem to look at sys.path:

import sys
p = '/Users/te/bin/clustalw-2.0.10-macosx'
sys.path.insert(0,p)

When I run my script (below), I get this:

cogent.app.util.ApplicationNotFoundError


So, I did this from the command line before running my test script:

export PATH=/Users/te/bin/clustalw-2.0.10-macosx:$PATH


The other issue is that the Clustal app controller expects clustalw. I have clustalw2. I don't know how to fix the App Controller, so I made a symbolic link in that directory:

ln -s clustalw2 clustalw

Come to think of it, I could have just done that from the Desktop and solved both issues at once. I held my breath, and did this:

from cogent.app.clustalw import Clustalw
app = Clustalw()
result = app('opuntia.fasta')['Align']
print result
print ''.join(result.readlines()[:12])


<open file 'opuntia.aln', mode 'r' at 0x101471690>
CLUSTAL 2.0.10 multiple sequence alignment


gi|6273285|gb|AF191659.1|AF191 TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
gi|6273284|gb|AF191658.1|AF191 TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
gi|6273287|gb|AF191661.1|AF191 TATACATTAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
gi|6273286|gb|AF191660.1|AF191 TATACATAAAAGAAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
gi|6273290|gb|AF191664.1|AF191 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
gi|6273289|gb|AF191663.1|AF191 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
gi|6273291|gb|AF191665.1|AF191 TATACATTAAAGGAGGGGGATGCGGATAAATGGAAAGGCGAAAGAAAGAA
******* **** *************************************


Note: as usual, all code is executed from the command line by doing
python script.py.