Rather than deal with the clustered OTUs for this post, I just uploaded all 1120 sequences, and carried out the analysis. The first time through (today) I forgot to include the outgroup! So that gives us a chance to see how much difference it makes.
It looks like this:
Here it is:
From the UniFrac FAQ:
My tree was not rooted, but I was able to upload my file and perform an analysis. Are the results valid?
There is no way to tell based on a Newick string alone whether a tree is rooted or not. If an unrooted tree is input, UniFrac will usually assign an arbitrary root and allow you to perform the analysis on that tree. How the tree is rooted can affect the results of both UniFrac tests and the P test. You should redo the analysis with a tree that is rooted with an appropriate outgroup.
It turns out to be easy enough..go back to RDP and browse to find SL7 and then add it to the sequence cart. Repeat the download to
dental_1120+_rdp.fna. Load the last 5 sequences into clustalx.app and look to make sure that SL7 is really properly aligned.
Repeat the PCA. You can look at the data in a spreadsheet app:
Now I plot it in matplotlib. The first image is what I plotted today for the rooted tree. The second is from the paper. Looks pretty good to me. Also, note some minor differences from the previous graphic where the tree we used was unrooted (and UniFrac rooted it for us however it does when it's not properly rooted).