Tuesday, December 8, 2009

PyCogent 8: getting matplotlib

I'm exploring PyCogent (docs, download page)---first post here.

As I mentioned the other day, I was having a little trouble with the drawing code for case study #2 from the paper (PMID 17708774, link). (Note that this part of the example is not in the text, but in supplementary file #4, link).

Although PyCogent initially used ReportLab for at least some drawing, I'm told that they have switched over completely to matplotlib. It's a good thing I didn't know that at first. I've struggled with matplotlib in the past. They have a special page about installing on OS X (the "it sucks to be you" kind of page), where they strongly recommend that you install a separate Python.

The senior author on the PyCogent paper is Gavin Huttley, whose group is at the John Curtin School of Medical Reearch at ANU in Canberra, Australia. I posted to the PyCogent help forum at sourceforge, and Gavin very patiently walked me through the solution to the issues I encountered. Most important, he worked up a wiki page explaining in detail how to build and install matplotlib for OS X Snow Leopard.

Another problem that slowed me down was that I had installed PyCogent first with pip, and subsequently with subversion, and somehow screwed things up. I ripped PyCogent out and rebuilt it and it works fine now. See the help thread for details.

Now, in the directory holding all the files I got from the supplementary data for the paper, I do:


cd src
python rRNA_display_tree.py


where the script has been modified slightly as described in the thread. A pdf is written to the figs directory. Here is a screenshot of the upper-left hand corner.



As described in the paper:
We display low G+C% to high G+C% on a spectrum from yellow to blue...We note that, in general, closely related taxa exhibit a similar color. However, certain lineages appear to have evolved to low G+C%, quickly raising questions about environmental and/or changes in DNA metabolism that may distinguish these organisms from their sister taxa. Another feature apparent from this tree is a general trend for earlier diverging lineages to be intermediate in G+C%, and for sud- den changes toward low G+C% to be more common than sud- den changes to high G+C% (there are more yellow branches surrounded by blue or green neighbors than blue branches surrounded by yellow or green neighbors).


I don't get much out of knowing the identity of the organism that has gone to low GC recently. But here is a paper (PMID 19001264) from Dan Andersson's group that may be relevant (ultimately) as to mechanism.