Python for Bioinformatics: Bioconductor: multiple plots

Friday, August 19, 2011

Bioconductor: multiple plots

Actually, I have a bit more to do with Bioconductor ExpressionSets, as long as I'm thinking about it. When I first played with the ALL example, I tried other pairs besides ('ALL1/AF4','BCR/ABL')---I'd have to look now, but they are probably in the original paper too. Anyway, in the first talk I gave about this stuff, I made plots of all the pairs. Some are really nice. So that's what I'd like to show now.

The difference is that then, I wrote the code in R, and I did not like the experience.

In Python, we do the standard imports. We factor out the R code into several functions, then load it from a text file in Python, and execute the code in R-space. In Python, we pick pairs to compare and call a plotting function in R.

The R code is a little hackish (I'm thinking of color.map), but this is a good start.

R code:

color.map <- function(g) {
colors = c('red','green','blue',
'cyan','magenta','maroon')
if (g == 'ALL1/AF4') i=1
if (g == 'BCR/ABL') i=2
if (g == 'E2A/PBX1') i=3
if (g == 'NEG') i=4
if (g == 'NUP-98') i=5
if (g == 'p15/p16') i=6
colors[i]
}

library('limma')
library('annaffy')

get.genenames <- function(eset) {
probeids <- featureNames(eset) 
symbols <- aafSymbol(probeids, 'hgu95av2') 
getText(symbols)
}

compare.factors <- function(eset,L) {
eset.sub = eset[,eset$mol.biol %in% L]
f <- factor(as.character(eset.sub$mol.biol))
design <- model.matrix(~f)
fit <- eBayes(lmFit(eset.sub,design))
pv = fit$p.value[,2]
sel <- p.adjust(pv) < 0.001
eset.sel <- eset.sub[sel,]
}

plot.eset <-function(eset,...) {
patient.colors <- unlist(lapply(
eset$mol.biol,color.map))
genenames = get.genenames(eset)
heatmap(exprs(eset),
col=topo.colors(100),
ColSideColors=patient.colors,
labRow=genenames)
}

The R code is in a file Rcode.txt. Python code:

import rpy2.robjects as robjects
from rpy2.robjects.packages import importr 
import bioc.biobase as biobase

r = robjects.r
g = robjects.globalenv
gd = importr('grDevices')

importr('ALL')
r('data("ALL")')
ALL = g['ALL']
eset = biobase.ExpressionSet(ALL)
#--------------------------------------------

FH = open('Rcode.txt','r')
s = FH.read()
FH.close()
r(s)
compare_factors = r['compare.factors']
plot_eset = r['plot.eset']
#--------------------------------------------
targets = [ ['ALL1/AF4','NEG'],
['ALL1/AF4','BCR/ABL'],
['E2A/PBX1','BCR/ABL'] ]

def get_ofn(t):
#print t, len(t)
u,v = t
s = u + '_' + v
ofn = s.replace('/','-') + '.pdf'
return ofn

def plot(eset_sub,t):
ofn = get_ofn(t)
gd.pdf(ofn)
plot_eset(eset_sub,ofn)
gd.dev_off()

for t in targets:
eset_sub = compare_factors(eset,t)
plot(eset_sub,t)

The first plot is at the top, and the second and third are below. It's pretty clear that there are significant commonalities according to mol.biol. BTW, I adjusted the target p-value to keep the number of hits down.

I also added code to substitute the gene names in the plots. That's pretty cool.

There are problems with the alternatives. That's for another time.