Friday, August 19, 2011

Bioconductor: multiple plots

Actually, I have a bit more to do with Bioconductor ExpressionSets, as long as I'm thinking about it. When I first played with the ALL example, I tried other pairs besides ('ALL1/AF4','BCR/ABL')---I'd have to look now, but they are probably in the original paper too. Anyway, in the first talk I gave about this stuff, I made plots of all the pairs. Some are really nice. So that's what I'd like to show now.

The difference is that then, I wrote the code in R, and I did not like the experience.

In Python, we do the standard imports. We factor out the R code into several functions, then load it from a text file in Python, and execute the code in R-space. In Python, we pick pairs to compare and call a plotting function in R.

The R code is a little hackish (I'm thinking of, but this is a good start.

R code: <- function(g) {
colors = c('red','green','blue',
if (g == 'ALL1/AF4') i=1
if (g == 'BCR/ABL') i=2
if (g == 'E2A/PBX1') i=3
if (g == 'NEG') i=4
if (g == 'NUP-98') i=5
if (g == 'p15/p16') i=6


get.genenames <- function(eset) {
probeids <- featureNames(eset) 
symbols <- aafSymbol(probeids, 'hgu95av2') 

compare.factors <- function(eset,L) {
eset.sub = eset[,eset$mol.biol %in% L]
f <- factor(as.character(eset.sub$mol.biol))
design <- model.matrix(~f)
fit <- eBayes(lmFit(eset.sub,design))
pv = fit$p.value[,2]
sel <- p.adjust(pv) < 0.001
eset.sel <- eset.sub[sel,]

plot.eset <-function(eset,...) {
patient.colors <- unlist(lapply(
genenames = get.genenames(eset)
The R code is in a file Rcode.txt. Python code:
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr 
import bioc.biobase as biobase

r = robjects.r
g = robjects.globalenv
gd = importr('grDevices')

ALL = g['ALL']
eset = biobase.ExpressionSet(ALL)

FH = open('Rcode.txt','r')
s =
compare_factors = r['compare.factors']
plot_eset = r['plot.eset']
targets = [ ['ALL1/AF4','NEG'],
['E2A/PBX1','BCR/ABL'] ]

def get_ofn(t):
#print t, len(t)
u,v = t
s = u + '_' + v
ofn = s.replace('/','-') + '.pdf'
return ofn

def plot(eset_sub,t):
ofn = get_ofn(t)

for t in targets:
eset_sub = compare_factors(eset,t)

The first plot is at the top, and the second and third are below. It's pretty clear that there are significant commonalities according to mol.biol. BTW, I adjusted the target p-value to keep the number of hits down.

I also added code to substitute the gene names in the plots. That's pretty cool.

There are problems with the alternatives. That's for another time.