It's in a file on disk called 'tree.data.txt'. The next step is to load it back into memory and organize it into two dictionaries, one for external and one for internal nodes. We also make a "reverseTree" dictionary where subnodes point to their parents. At the end of the initial phase, we can print the contents to the screen to verify things are OK.
Now, we need to do some work. Each node's x-position will be the sum of all the x-distances back to the root. And the y-position is an integer (1-6) for the external nodes.
For the internal nodes it's a little trickier. The y-position is the average of the y-positions of its subnodes (or the median, if there are three of them). We also need to remember the values for the extreme y-positions of the subnodes (for the vertical bars in the plot). We save these in a separate dictionary called zD.
When we're done, we print out the values for each node to check them over.
We have the node name, parent, x-position and y-position, label (e.g. Escherichia coli), and finally for internal nodes, the min and max of the subnode y positions. Looks good to me.
Before I started working on this problem earlier in the week, I remembered that I had posted about it before. But I couldn't find my code from the last time, (and I forgot that I'd posted that as well---another senior moment!) Anyway, this solution is much shorter and quite a bit simpler. The complex part is what we did today, working out the x and y positions of each node, as well as (next time) mapping the lines that connecting them, and finally coercing R into doing the plot.
The entire listing: