Thursday, February 24, 2011

Qiime (6) quick version



This example is for the impatient (like me): running part 1 of the QIIME tutorial--checklist version.

[ UPDATE Note: the "workflow script" in the tutorial is actually easier to run than what I have here. It needs only three changes: paths to the Greengenes files (below) and changing the rep set picking method to most_abundant, if desired. This post is about doing the workflow in such a way that the scripts are invoked individually. ]

Prerequisites (as discussed here and here)
Install and test:


  • Python 2.6
  • Numpy > 1.3.0
  • PyCogent
  • uclust
  • PyNAST
  • FastTree
  • Java RE (for RDP)
  • RDP Classifier
  • QIIME


    Make sure you have the equivalent of this (some scripts don't seem to look at $PATH):


    > cat ~/.qiime_config 
    qiime_scripts_dir /Users/telliott/bin/qiime/bin


    and that ~/bin/qiime/bin is on your $PATH


    > pick_otus.py -h | head -n 
    Usage: pick_otus.py [options] {-i/--input_seqs_filepath INPUT_SEQS_FILEPATH}



    From the Desktop, get the Greengenes core set data file (38 MB download link). Also get the Greengenes alignment lanemask file (download link). And get the tutorial files (download link), or do:


    curl -O http://bmf.colorado.edu/QIIME/qiime_tutorial-v1.2.0.zip
    tar -xf qiime_tutorial-v1.2.0.zip
    rm -r qiime_tutorial-v1.2.0.zip


    Make some directories:


    mkdir qiime_tutorial/gg
    mkdir qiime_tutorial/figs
    mkdir qiime_tutorial/other


    For the next two steps, if you get an error, recheck the file extensions:


    mv core_set_aligned.fasta.imputed qiime_tutorial/gg/core.txt
    mv lanemask_in_1s_and_0s.txt qiime_tutorial/gg/mask.txt


    I prefer short names, so these directory and file names are different than those in the tutorial.


    cd qiime_tutorial
    mv Fasting_Example.fna data.fna
    mv Fasting_Example.qual data.qual
    mv Fasting_Map.txt map.txt
    mv Fasting_Example.sff.txt other/Fasting_Example.sff.txt
    mv Fasting_Example.sff other/Fasting_Example.sff
    mv qiime_tutorial_commands_parallel.sh other/parallel.sh
    mv qiime_tutorial_commands_serial.sh other/serial.sh
    mv README other/README
    mv custom_parameters.txt other/custom.txt

    > ls
    data.fna figs map.txt
    data.qual gg other


    Step 0:

    split_libraries.py -f data.fna -m map.txt -q data.qual -o split
    mv data.fna other/data.fna
    mv data.qual other/data.qual
    cp split/seqs.fna seqs.fna


    At this point, you could make a backup (including the huge core.txt):

    cp -r ../qiime_tutorial ../qiime_tutorial_backup



    Analysis:


    pick_otus.py -i seqs.fna -m uclust -s 0.97 -o otus
    pick_rep_set.py -f seqs.fna -i otus/seqs_otus.txt -m most_abundant -o otus/reps.txt
    align_seqs.py -i otus/reps.txt -m pynast -t gg/core.txt -o aln
    assign_taxonomy.py -i otus/reps.txt -m rdp -o tax
    filter_alignment.py -i aln/reps_aligned.txt -m gg/mask.txt -o aln2
    make_phylogeny.py -i aln2/reps_aligned_pfiltered.fasta -o tree.tre
    make_otu_table.py -i otus/seqs_otus.txt -t tax/reps_tax_assignments.txt -o figs/otu_table.txt


    Rather than do another heatmap, I thought I'd show a summary area chart


    plot_taxa_summary.py -i figs/otu_table.txt -l Phylum -o figs -k white


    Unfortunately, it's not what we want (the graphic at top seems to be plotting the individual OTUs rather than Phyla). The example in the tutorial uses otu_table_Level3.txt

    I don't see an option for controlling the level of output in make_otu_table.py. I'll have to investigate how to do that. Perhaps if we follow the workflow script it will turn out correctly.