## Tuesday, February 22, 2011

### Qiime (5) alpha diversity

Continuing with the exploration of QIIME (earlier posts: one two three four).

We're following the overview tuturial (here), which has four main parts. There are the three big sections of `custom_parameters.txt`:

• picking and analysis of species/OTUs
• analysis of alpha diversity
• analysis of beta diversity

plus

• a suite of data visualization programs

Today we're thinking about alpha diversity (within sample diversity). Goals of diversity analysis (using 16S rRNA sequences in bacteria) include measurement of:

• observed species/OTU richness
• population distribution among species/OTUs
• phylogenetic diversity among OTUs

where OTUs can be defined at different thresholds of sequence identity.

For this post, I've got screenshots of two graphics produced from the Qiime tutorial analysis of alpha diversity. Consider the first graphic, which plots observed species in two different samples as a function of the number of individual sequences examined. Because chance influences the order in which samples are obtained, resampling techniques are used to generate a rarefaction curve that averages the results of many random samplings of the observed data. Perhaps more important, it allows normalization for the number of samples observed, allowing comparison of samples with different sizes.

These plots are rarefaction curves.

The most important questions are probably these two:

(1) If we could sample exhaustively, what would be the final species count (or phylogenetic distribution or.. ). In other words, does the curve level off, and what is the asymptotic value?

(2) Can we decide whether two different populations differ significantly even without exhaustive sampling?

I do not know much about this area, so I should probably just be quiet at this point, but I have to say that I am suspicious that the first question does not always have a good answer (even if people would wish for one). The problem is that the shape of the curve as it goes out into high number of samples does not necessarily depend on the shape at lower numbers. It might do so, if the population structure is not too skewed. But no one can say in advance whether that is true or not.

Anyway, we follow the tutorial. First remake (and this time, save) the otu_table:

 `make_otu_table.py -i otus/seqs_otus.txt -t tax/reps_tax_assignments.txt -o figs/otu_table.txt`

We'll use the workflow script (I don't see too much of interest after a cursory look at the intermediate steps). We can get the required options with -h (help):

 `> alpha_rarefaction.py -h.. REQUIRED options: The following options must be provided under all circumstances. -i OTU_TABLE_FP, --otu_table_fp=OTU_TABLE_FP the input otu table [REQUIRED] -m MAPPING_FP, --mapping_fp=MAPPING_FP path to the mapping file [REQUIRED] -o OUTPUT_DIR, --output_dir=OUTPUT_DIR the output directory [REQUIRED] -p PARAMETER_FP, --parameter_fp=PARAMETER_FP path to the parameter file [REQUIRED]`

What we actually run is:

command:

 `alpha_rarefaction.py -i figs/otu_table.txt -m map.txt -o rare/ -p custom_parameters.txt -t figs/tree.tre -f`

The default settings in `custom_parameters.txt` are fine for this.

If you do want to run each of the steps individually, I suggest you run the workflow script, then the log file in will contain a history of the commands that were executed:

 `# Alpha rarefaction command python /Users/telliott/bin/qiime/bin/multiple_rarefactions.py -i figs/otu_table.txt -m 10 -x 148 -s 13 -o rare//rarefaction/ --num-reps 5# Alpha diversity on rarefied OTU tables command python /Users/telliott/bin/qiime/bin/alpha_diversity.py -i rare//rarefaction/ -o rare//alpha_div/ -t figs/tree.tre --metrics chao1,observed_species,PD_whole_tree# Collate alpha command python /Users/telliott/bin/qiime/bin/collate_alpha.py -i rare//alpha_div/ -o rare//alpha_div_collated/ # Rarefaction plot: All metrics command python /Users/telliott/bin/qiime/bin/make_rarefaction_plots.py -i rare//alpha_div_collated/ -m map.txt -o rare//alpha_rarefaction_plots/ --background_color white --resolution 75 --imagetype png`

The second graphic shows that considering phylogenetic diversity may reveal significant differences that are not seen when just counting OTUs.