(Because the sequences are numbered in order, in the new set the labels A11 and A12 have been reassigned to the previous A13 and A14). The sequences are organized to make a phylogenetic tree which was written to disk as

`'samples.tree'`

. I also wrote a simple script to process the titles of the sequences and save them in a file named `'environ.txt'`

. The first line is:Now let's use these two files as input to UniFrac (web interface). We compute the UniFrac metric or distance for each pair of samples. As you can see, we're going to do 1000 randomizations (this requires simple registration with the site):

and evaluate the signficance of the results by repeated randomization of the labels. The raw

*p-values*

need to be corrected by multiplying by the total number of comparisons (Bonferroni correction).

Notice that only A and B are significantly different, although it was a near-run thing with A and C. We download the actual UniFrac values for comparison with what PyCogent gives us. The text version of the file looks like this:

We also ask for a PCA (Principal

*Coordinates*Analysis). Here is the plot of the two principal coordinates:

And here are the eigenvalues and eigenvectors in a data file:

Of course, we can always replot using R:

You can make this as fancy as you like.

## No comments:

Post a Comment