The FASTA format is defined here. There are various problems one can run into with long titles, including the presence of spaces, character limits, and a requirement for unique names within those limits. For now, I'll just remove most of the title, leaving '>' plus the Genbank id from the end of the line. (If the identifiers are already known, I have a Python script that will retrieve the sequences from Genbank automatically). Now I have 5 sequences saved in 'strep.txt':
X58303 Streptococcus mutans
AF003928 Streptococcus sanguinis
AF003930 Streptococcus pneumoniae
AB002521 Streptococcus pyogenes
AF003933 Streptococcus parasanguinis
Next I run clustal to get a sequence alignment. I got clustalx and clustalw from here. (Click on Download Software). Load the sequences into ClustalX (2.0.5). Under Alignment => Output Format Options select 'FASTA format' and then select 'Do Complete Alignment.' It looks like this:
data:image/s3,"s3://crabby-images/23fff/23fff34082d9730e7a05eb83b8d1228185e3adbf" alt=""
Now, to plot the tree. ClustalX has given us an output file named 'strep.fasta.' I start up R (get it here). I have previously used the Package Manager to install a phylogenetics package called 'ape'. For more documentation you can search the R web site for 'ape'. This is my interaction with the R interpreter:
data:image/s3,"s3://crabby-images/e4b4e/e4b4ea96cea4be5402714b14553e2feba610d60a" alt=""
And here is the plot of the tree:
data:image/s3,"s3://crabby-images/01ca2/01ca2118044337bcc376da73f6d4eec390e2d674" alt=""