Thursday, February 24, 2011

Qiime (6) quick version

This example is for the impatient (like me): running part 1 of the QIIME tutorial--checklist version.

[ UPDATE Note: the "workflow script" in the tutorial is actually easier to run than what I have here. It needs only three changes: paths to the Greengenes files (below) and changing the rep set picking method to most_abundant, if desired. This post is about doing the workflow in such a way that the scripts are invoked individually. ]

Prerequisites (as discussed here and here)
Install and test:

  • Python 2.6
  • Numpy > 1.3.0
  • PyCogent
  • uclust
  • PyNAST
  • FastTree
  • Java RE (for RDP)
  • RDP Classifier

    Make sure you have the equivalent of this (some scripts don't seem to look at $PATH):

    > cat ~/.qiime_config 
    qiime_scripts_dir /Users/telliott/bin/qiime/bin

    and that ~/bin/qiime/bin is on your $PATH

    > -h | head -n 
    Usage: [options] {-i/--input_seqs_filepath INPUT_SEQS_FILEPATH}

    From the Desktop, get the Greengenes core set data file (38 MB download link). Also get the Greengenes alignment lanemask file (download link). And get the tutorial files (download link), or do:

    curl -O
    tar -xf
    rm -r

    Make some directories:

    mkdir qiime_tutorial/gg
    mkdir qiime_tutorial/figs
    mkdir qiime_tutorial/other

    For the next two steps, if you get an error, recheck the file extensions:

    mv core_set_aligned.fasta.imputed qiime_tutorial/gg/core.txt
    mv lanemask_in_1s_and_0s.txt qiime_tutorial/gg/mask.txt

    I prefer short names, so these directory and file names are different than those in the tutorial.

    cd qiime_tutorial
    mv Fasting_Example.fna data.fna
    mv Fasting_Example.qual data.qual
    mv Fasting_Map.txt map.txt
    mv Fasting_Example.sff.txt other/Fasting_Example.sff.txt
    mv Fasting_Example.sff other/Fasting_Example.sff
    mv other/
    mv other/
    mv README other/README
    mv custom_parameters.txt other/custom.txt

    > ls
    data.fna figs map.txt
    data.qual gg other

    Step 0: -f data.fna -m map.txt -q data.qual -o split
    mv data.fna other/data.fna
    mv data.qual other/data.qual
    cp split/seqs.fna seqs.fna

    At this point, you could make a backup (including the huge core.txt):

    cp -r ../qiime_tutorial ../qiime_tutorial_backup

    Analysis: -i seqs.fna -m uclust -s 0.97 -o otus -f seqs.fna -i otus/seqs_otus.txt -m most_abundant -o otus/reps.txt -i otus/reps.txt -m pynast -t gg/core.txt -o aln -i otus/reps.txt -m rdp -o tax -i aln/reps_aligned.txt -m gg/mask.txt -o aln2 -i aln2/reps_aligned_pfiltered.fasta -o tree.tre -i otus/seqs_otus.txt -t tax/reps_tax_assignments.txt -o figs/otu_table.txt

    Rather than do another heatmap, I thought I'd show a summary area chart -i figs/otu_table.txt -l Phylum -o figs -k white

    Unfortunately, it's not what we want (the graphic at top seems to be plotting the individual OTUs rather than Phyla). The example in the tutorial uses otu_table_Level3.txt

    I don't see an option for controlling the level of output in I'll have to investigate how to do that. Perhaps if we follow the workflow script it will turn out correctly.