Last time, I used a set of simulated sequences and their phylogenetic tree as input to the web interface of Unifrac (the original version). This time, I propose to repeat the analysis on my laptop using PyCogent. (See the PyCogent category in the sidebar for more from me on this, or start with this post). I'm just going to do the analysis here. Next time I want to look the distinction between PCA and PCoA.
Provided that we've done our imports, and loaded both the tree and the environment, the actual analysis runs in a single line. The results are shown first, followed by the code. The commented lines in the code were used in the first implementation based on the usage example, but aren't needed here.
There are a couple of things to notice. As for the rest of PyCogent, objects bearing results act like dicts, and we recover the elements of interest with
result['pcoa']
(for example). Second, the distance matrix (a table of UniFrac metrics for pairwise comparisons between each of the three samples) is slightly different than what we got from the web interface. The comparison of A with C gives exactly the same result, but the other two have changed. Therefore, it is not surprising that the PCoA results are different. It is however, a bit of a shock to see that the small change in the ED (Environmental Distance) matrix has rotated the plot through about 170°.[UPDATE: I forgot to mention that I had altered PyCogent slightly to always do PCoA. I think what you will need to do with this code is to pass an argument to
fast_unifrac
of modes = ['distance_matrix','pcoa']
, but I haven't tested it yet.]