The column headings and a sample line are as shown, when run on the sequence of the Salmonella typhimurium hemA gene:
In an early post I blogged about this and posted some code.
It seemed like a good idea to compare results from the two programs. One problem is the redundancy of restriction enzymes with respect to cleavage sites. Every position reported by silent in EMBOSS has a number of enzymes, some as many two dozen or so (each one listed as a separate hit).
If you've ever worked with these enzymes you know that they have personalities, some are highly active, some produce overhangs, some are stable, some not so good. Filtering out the redundancy in a smart way is a bit of a pain. That's why one of the optional inputs to the program is a list of enzymes that you like. I ran
silent
with the default setting (lots of hits), and then for this analysis I made a short list of "good_enzymes."To repeat what we did previously I copied out
Restriction_Dictinary.py
from the biopython-1.55 source. Then I went back to the old post and got three files (REnzymes, GeneticCode2, extrasites
).I ran
REnzymes.py
and it looks like it works even though the format of Restriction_Dictinary.py
has changed (and is really unrecognizable to me). I modified the old extrasites.py
slightly to make the sequence uppercase. Rather than mess with that script any more, I write the results to disk as my.results.txt
.So now the problem is to load to the two different textfiles with results and compare the data. A classic everyday bioinformatics problem. My output looks like this:
The original sequence is on the second line and the mutated sequence on the third. The affected codon is set off from the surrounding sequence by a space on each side.
The code to analyze the differences is pretty ugly. I listed it below but I hope you don't look at it unless you're stuck. The output, shown next, reveals that for the most part the two sets of results are congruent. EMBOSS results are output on a single line starting with 'E', while my results are output on three lines, the first starting with 'M'. The results were sorted by order of position in the sequence. The EMBOSS
Base-Posn
has been converted to a codon for consistency.A few significant differences can be seen. Mostly they involve a single enzyme, KasI. I haven't sorted that out yet. Overall, I think the agreement is very good.
output:
code listing: