Python for Bioinformatics: Qiime (6) quick version

Thursday, February 24, 2011

Qiime (6) quick version

This example is for the impatient (like me): running part 1 of the QIIME tutorial--checklist version.

[ UPDATE Note: the "workflow script" in the tutorial is actually easier to run than what I have here. It needs only three changes: paths to the Greengenes files (below) and changing the rep set picking method to most_abundant, if desired. This post is about doing the workflow in such a way that the scripts are invoked individually. ]

Prerequisites (as discussed here and here)
Install and test:

Python 2.6
Numpy > 1.3.0
PyCogent
uclust
PyNAST
FastTree
Java RE (for RDP)
RDP Classifier

QIIME

Make sure you have the equivalent of this (some scripts don't seem to look at $PATH):

> cat ~/.qiime_config 
qiime_scripts_dir       /Users/telliott/bin/qiime/bin

and that ~/bin/qiime/bin is on your $PATH

> pick_otus.py -h | head -n 
Usage: pick_otus.py [options] {-i/--input_seqs_filepath INPUT_SEQS_FILEPATH}

From the Desktop, get the Greengenes core set data file (38 MB download link). Also get the Greengenes alignment lanemask file (download link). And get the tutorial files (download link), or do:

curl -O http://bmf.colorado.edu/QIIME/qiime_tutorial-v1.2.0.zip
tar -xf qiime_tutorial-v1.2.0.zip
rm -r qiime_tutorial-v1.2.0.zip

Make some directories:

mkdir qiime_tutorial/gg
mkdir qiime_tutorial/figs
mkdir qiime_tutorial/other

For the next two steps, if you get an error, recheck the file extensions:

mv core_set_aligned.fasta.imputed qiime_tutorial/gg/core.txt
mv lanemask_in_1s_and_0s.txt qiime_tutorial/gg/mask.txt

I prefer short names, so these directory and file names are different than those in the tutorial.

cd qiime_tutorial
mv Fasting_Example.fna data.fna
mv Fasting_Example.qual data.qual
mv Fasting_Map.txt map.txt
mv Fasting_Example.sff.txt other/Fasting_Example.sff.txt
mv Fasting_Example.sff other/Fasting_Example.sff
mv qiime_tutorial_commands_parallel.sh other/parallel.sh
mv qiime_tutorial_commands_serial.sh other/serial.sh
mv README other/README
mv custom_parameters.txt other/custom.txt

> ls
data.fna figs  map.txt
data.qual gg  other

Step 0:

split_libraries.py -f data.fna -m map.txt -q data.qual -o split
mv data.fna other/data.fna
mv data.qual other/data.qual
cp split/seqs.fna seqs.fna

At this point, you could make a backup (including the huge core.txt):

cp -r ../qiime_tutorial ../qiime_tutorial_backup

Analysis:

pick_otus.py -i seqs.fna -m uclust -s 0.97 -o otus
pick_rep_set.py -f seqs.fna -i otus/seqs_otus.txt -m most_abundant -o otus/reps.txt
align_seqs.py -i otus/reps.txt -m pynast -t gg/core.txt -o aln
assign_taxonomy.py -i otus/reps.txt -m rdp -o tax
filter_alignment.py -i aln/reps_aligned.txt -m gg/mask.txt -o aln2
make_phylogeny.py -i aln2/reps_aligned_pfiltered.fasta -o tree.tre
make_otu_table.py -i otus/seqs_otus.txt -t tax/reps_tax_assignments.txt -o figs/otu_table.txt

Rather than do another heatmap, I thought I'd show a summary area chart

plot_taxa_summary.py -i figs/otu_table.txt -l Phylum -o figs -k white

Unfortunately, it's not what we want (the graphic at top seems to be plotting the individual OTUs rather than Phyla). The example in the tutorial uses otu_table_Level3.txt

I don't see an option for controlling the level of output in make_otu_table.py. I'll have to investigate how to do that. Perhaps if we follow the workflow script it will turn out correctly.