Thursday, March 10, 2011

Dental project (4)

This post is one of a series (see dental project here or in the sidebar).

After getting a set of sequences and removing chimeras, the next step is almost anticlimactic. We just copy a modified version of the shell script (from here, without the cd calls) or paste in the commands working from the dental directory (either individually, or all at once):

#!/bin/bash -i seqs.fna -m uclust -s 0.97 -o otus -f seqs.fna -i otus/seqs_otus.txt -m most_abundant -o otus/reps.txt -i otus/reps.txt -m pynast -t ~/data/core.txt -o aln -i otus/reps.txt -m rdp -o tax -i aln/reps_aligned.txt -m ~/data/mask.txt -o aln2 -i aln2/reps_aligned_pfiltered.fasta -o figs/tree.tre -i otus/seqs_otus.txt -t tax/reps_tax_assignments.txt -o figs/otu_table.txt -i figs/otu_table.txt -o figs/otu_table_Level3.txt -L 3 -i figs/otu_table_Level3.txt -l Phylum -o figs -k white -i figs/otu_table.txt -o figs

It's all over in a few seconds.

The heatmap QIIME produced is at the top of the post. It is truly a remarkable html page, with a graphic where you can reorder the columns or rows by drag and drop, and redo the map at different threshholds for the OTUs, etc. I've never seen anything quite like it. But (and this is just me), it's not pretty enough.

So what I'd like to do from here is to show you how I currently make heatmaps with matplotlib, and we'll get into that next time.

First, I have to extract the data from QIIME. The script is complicated a bit by an additional job: I'm going to organize the rows and columns. (QIIME can do this too---see the tutorial).

The columns will be in the order they appear in sample_names.txt and the rows as they appear in genera_and_colors.txt. These files are in the same directory. The second one starts like this:

# Bacteria black
# Bacteroidetes green

I just do this:

> python > data.csv

Here's the first part of data.csv:


The leading comma on line 1 is so the column headers line up properly in a spreadsheet. Speaking of spreadsheets, here is a screenshot after dropping the data.csv file onto Numbers (you could use Excel, of course):

That was painless! Zipped project files in Dropbox (here).