The FASTA format is defined here. There are various problems one can run into with long titles, including the presence of spaces, character limits, and a requirement for unique names within those limits. For now, I'll just remove most of the title, leaving '>' plus the Genbank id from the end of the line. (If the identifiers are already known, I have a Python script that will retrieve the sequences from Genbank automatically). Now I have 5 sequences saved in 'strep.txt':
X58303 Streptococcus mutans
AF003928 Streptococcus sanguinis
AF003930 Streptococcus pneumoniae
AB002521 Streptococcus pyogenes
AF003933 Streptococcus parasanguinis
Next I run clustal to get a sequence alignment. I got clustalx and clustalw from here. (Click on Download Software). Load the sequences into ClustalX (2.0.5). Under Alignment => Output Format Options select 'FASTA format' and then select 'Do Complete Alignment.' It looks like this:
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhZLFbtIBnNkLjZJk5o2VhKt-VK_1ALBo79yIfybYKqJ7ItwetssrbPP7Mqa-hc1dMBdholesTLScLhmxCA-xeuNUmfvlg8xeuw0btbvpySL2-PUVzeVa1HAV0J6lS129rVCiLxMxSlhPfn/s320/Picture+2.png)
Now, to plot the tree. ClustalX has given us an output file named 'strep.fasta.' I start up R (get it here). I have previously used the Package Manager to install a phylogenetics package called 'ape'. For more documentation you can search the R web site for 'ape'. This is my interaction with the R interpreter:
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhUKrMYsbHzpd-uE6PXp2As0_Si7GY2RnQBm_XOpxw1M1fwkE9crocipS3kDZeiYyrhG2ASGbgi5FnvX0_TvyUyaNEgf1OpZJMFdB9PwTFe475wLySFtaTgqIhpfvEWYJuLQzzs9SMJabR2/s320/Picture+1.png)
And here is the plot of the tree:
![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhkjvGrQRZrN90ToW2iW2QmsxPyDdUYzw0cmGeZSj318LCGM6ukmQDU0V5RmSz8TUOwPl5cxStB06avydwVGagI9o2Nedwjy4YcSC6Xn0TKRG_dyiYe0onwUtDfVv6rUNQ6caHVCBljCrlR/s320/Picture+3.png)