Monday, November 22, 2010

UPGMA in Python

I spent a whole day working on a script to do UPGMA. (It took a lot longer than I thought it should). This first version analyzes the data from the same tree as we constructed in an earlier post (here), because it's simple.

At the end of the run, we have the correct tree, as shown by the first line in the last section of the output:

(((('A', 'B'), 'C'), ('D', 'E')), 'F') 

A {'up': 1.0, 'parent': 'AB'}
C {'up': 2.0, 'parent': 'ABC'}
B {'up': 1.0, 'parent': 'AB'}
E {'up': 2.0, 'parent': 'DE'}
D {'up': 2.0, 'parent': 'DE'}
F {'up': 4.0, 'parent': 'ABCDEF'}
DE {'to_tips': 2.0, 'right': 'E', 'up': 1.0, 'parent': 'ABCDE', 'left': 'D'}
ABCDE {'to_tips': 3.0, 'right': 'DE', 'up': 1.0, 'parent': 'ABCDEF', 'left': 'ABC'}
ABC {'to_tips': 2.0, 'right': 'C', 'up': 1.0, 'parent': 'ABCDE', 'left': 'AB'}
AB {'to_tips': 1.0, 'right': 'B', 'up': 1.0, 'parent': 'ABC', 'left': 'A'}
ABCDEF {'to_tips': 4.0, 'right': 'F', 'left': 'ABCDE'}

I struggled with trying to use the distance information to modify the string representation of the tuple on that first line, but failed. I know I can reconstruct the tree from the data for all the nodes that is in the dictionaries shown. That's for another time.

Once again, I have a better understanding through having coded an (almost) working example in Python. The files are on Dropbox (here).  [Edit:  the files are here].