def chiSquared(s, v=False): |
We plot the results using R. Notice that segments with high bias rise above the noise, and these segments tend to cluster, as marked by the vertical red line:
If we take a closer look, we resolve particular segments involved.
This region around 3796 - 3800 kb is involved in LPS biosynthesis. This is a screenshot from my GeneFinder app.
Here is a closeup of Fig 1 in Lawrence and Ochman 1998.
And here is part of an abstract of a paper noting the lack of conservation between E. coli and Salmonella. As far as I can tell, the functions of the the genes with very high bias aren't known, but presumably their lack of conservation is related to evolutionary changes in LPS.
If we set the verbose flag, we can go back to this segment and look at the values:
3794 A 263 C 225 G 278 T 234 7.34 |
What would be nice at this point would be to filter the output for regions with extreme values and then recover and order all the ORFs in each. I'll leave that as an exercise for the reader. :)