def parseArticle(article): |
There is one thing about it that's hackish. If I have a file with a single article, or with multiple articles that I've copied and pasted from Genbank, there is no proper root element and ElementTree chokes when I call ElementTree.parse(filename). What I do is to catch the exception and handle it by pasting the data into a temporary file with the added root element. Then I feed the temporary file to ElementTree.
I did a Pubmed search for my graduate advisor (E.P. Geiduschek) and my post-doctoral mentor (J.R. Roth) and pasted all 388 records to the Clipboard, then from the clipboard I sent the XML to file. The script handles this input. The first part of the printout is:
title Dissection of the Bacteriophage T4 Late Promoter Complex. authorList Nechaev S, Geiduschek EP journal J Mol Biol year 2008 volume None pages pmid 18455735 abstract Activated transcription of the bacteriophage T4 la |