![](https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjkMRACKqmgnG7i3fjkMYmpqtIdqj1e0PEo1D8TPTj9hN-7bwJg2a_msNrihHJEmCnkWA18ZTvMy5VUeEhPI1SAqNNPremPDEnerLqdmFzlrxIDxk73uk2iRm9MGqTHlGA9PRseQyCa1exW/s320/oligos.png)
We can look at the frequencies of longer oligos in the genome using Python. In this example, I look at the genome of Haemophilus influenzae because I know there is something interesting. The sequence is from Genbank L42023. On the average, we'd expect an individual 12-mer oligo (in a 50% GC genome) to be present once in 1.7 Mbp (4**12).
seq = open('Hinf.genome.txt','r').read().strip() |
It prints a list of oligos which are all related sequences except for the first one.
|
We plot the results using R
setwd('Desktop') |