(Note: This has not been tested very much, although it does work on several examples that I tried. It is just a toy program for use in exploring what trees are about).
The parser works in three phases:
First, we process each character of the input and recognize the five special characters '():,;', generating a list of tokens. Disregarding newlines and other characters that are not valid, we accumulate standard characters in a buffer which is flushed when a special character is encountered.
Here is the first 7 lines of the output for the tree from last time:
In phase two, we make note of names and distances from each external node to its parental (internal) node. The external node information in the tree is replaced by a short name. Here is the output of the second stage:
The tree after this step:
(There is a trick in the second step. Because we are modifying the list of tree elements within this function, we process the external nodes in reverse order. This ensures that all the other indexes remain valid).
In the third and final phase, we break down the structure of the tree into its internal nodes. Each internal node gets a name, a list of its sub-nodes, and the distance to its parental node. The output is shown below. The information for the external and internal nodes will be written as text to disk and then processed further.
After processing the first two internal nodes, the tree looks like this:
The full listing: