Python for Bioinformatics: How many trees?

Wednesday, March 24, 2010

How many trees?

One of the central facts (and limitations) in phylogenetic analysis is that the number of possible trees is enormous. Felsenstein discusses this issue extensively in Chapter 3. Consider the simplest example, those trees which do not contain directionality (unrooted trees). There is a single tree for 3 species or OTUs (operational taxonomic units). For N = 4, there are three possible trees, as shown above.

There are several ways to see this result for N = 4. One is to just enumerate the possibilities, and recognize that rotation around the central branch does not change the tree. These two trees are the same:

You can think of a tree as a kind of mobile.

Or, we might consider that A has a particular nearest neighbor in the tree, and there are only three choices for this spot. Once that choice is made, the tree is determined.

A third way is to start with the 3 OTU tree and recognize that we can add a new branch at any one of the three existing branches. Again, we can make three trees for N = 4.

We can extend the last idea. Take one particular tree from N = 4. This tree contains five branches, and on any one of these we can add a new OTU. Thus, for each of the three N = 4 trees, we can produce five new ones for N = 5. Each of these new trees is unique, since for each branch we choose, E has a particular nearest neighbor, and in that set of trees there are various unique combinations.

N                   trees      branches
3                      1           3
4      1 x  3          3           5
5      3 x  5         15           7
6     15 x  7        105           9
7    105 x  9        945          11
8    945 x 11     10,395          13

22              ≈ a mole of trees

Can you see the pattern in the table? If we have the N-species tree, the number of new trees we can make for N + 1 is 1 x 3 x 5 x 7 x 9 x ... (2n - 3).