Python for Bioinformatics: Handling large sequence sets (7) Encore

Saturday, March 5, 2011

Handling large sequence sets (7) Encore

There is one more thing to say about the HITS project. Using bowtie worked well enough that I feel comfortable posting a plot similar to Fig 2 of the paper. Each point represents one of 1148 genes plotted. The x-axis is saturation (fraction of TA sites in that gene which were hit by the transposon in the library) and the y-axis is the selection index (ratio of total HITS in the lung compared to the library).

It doesn't look exactly like the paper, but it's pretty good.

I couldn't get matplotlib to do the semi-log plot, so I used R:

setwd('Desktop')
data = read.table('plot_data.txt',head=F)
color.list = rep('steelblue',length(data[,1]))
sel = data[,3] < 0.15
color.list[sel] = 'red'
sel = data[,2] < 0.4
color.list[sel] = 'lightgray'
plot(data[,2],data[,3],
  col=color.list,log='y',pch=16,
  xlab='saturation',ylab='selection index')

And that really is it for this. Here is hemR again: