Thursday, November 25, 2010

My version of neighbor-joining in Python

I finished debugging my version of the neighbor-joining method in Python. The zipped project files are here. I also made an nj tree from the same data using PyCogent, ape, and Phylip's neighbor. Everybody agrees!

One last thing I haven't done is to write the code to assemble the tree's representation. What I actually have is a list of internal nodes with distances to their child nodes. There's always something more to do.

PyCogent:

edge.0 A:1.0;
edge.0 B:4.0;
edge.1 C:2.0;
edge.2 F:4.75;
root D:2.75;
root E:2.25;

edge.2 edge.1 1.25 edge.2 edge.0 2.25
edge.1 edge.0 1.0
root edge.2 0.75 root edge.1 2.0 root edge.0 3.0

((((A:1.0,B:4.0):1.0,C:2.0):1.25,F:4.75):0.75,D:2.75,E:2.25);

Phylip:

Between        And            Length
------- --- ------
1 B 4.00000
1 2 1.00000
2 C 2.00000
2 4 1.25000
4 3 0.75000
3 E 2.25000
3 D 2.75000
4 F 4.75000
1 A 1.00000

(B:4.00000,(C:2.00000,((E:2.25000,D:2.75000):0.75000,F:4.75000):1.25000):1.00000,A:1.00000);

ape:

> tr$edge.length
[1] 2.25 2.75 0.75 1.25 1.00 1.00 4.00 2.00 4.75
> tr$edge
[,1] [,2]
[1,] 7 5
[2,] 7 4
[3,] 7 8
[4,] 8 9
[5,] 9 10
[6,] 10 1
[7,] 10 2
[8,] 9 3
[9,] 8 6
> tr$tip.label
[1] "A" "B" "C" "D" "E" "F"

(E:2.25,D:2.75,(((A:1,B:4):1,C:2):1.25,F:4.75):0.75);

me:

0 :    B   4.000   A   1.000  
1 : C 2.000 0 1.000
2 : E 2.250 D 2.750
3 : 1 1.250 2 0.750 F 4.75

2 comments:

JK said...

Hi t,

I just encountered a situation where I needed to build an NJ tree directly from a distance matrix, not from sequences. (I'm working with introns, not with sequences directly, and I have a magic function to compute the distance between two genes based on intron position.) It took only 30 lines of Python, which surprised even me. I'll post here if you're interested.

-- JoeK

telliott99 said...

Hi JK,
Why not post it on your blog, and I'll link to that here.