Actually, I have a bit more to do with Bioconductor ExpressionSets, as long as I'm thinking about it. When I first played with the ALL example, I tried other pairs besides ('ALL1/AF4','BCR/ABL')---I'd have to look now, but they are probably in the original paper too. Anyway, in the first talk I gave about this stuff, I made plots of all the pairs. Some are really nice. So that's what I'd like to show now.
The difference is that then, I wrote the code in R, and I did not like the experience.
In Python, we do the standard imports. We factor out the R code into several functions, then load it from a text file in Python, and execute the code in R-space. In Python, we pick pairs to compare and call a plotting function in R.
The R code is a little hackish (I'm thinking of color.map), but this is a good start.
R code:
color.map <- function(g) { colors = c('red','green','blue', 'cyan','magenta','maroon') if (g == 'ALL1/AF4') i=1 if (g == 'BCR/ABL') i=2 if (g == 'E2A/PBX1') i=3 if (g == 'NEG') i=4 if (g == 'NUP-98') i=5 if (g == 'p15/p16') i=6 colors[i] } library('limma') library('annaffy') get.genenames <- function(eset) { probeids <- featureNames(eset) symbols <- aafSymbol(probeids, 'hgu95av2') getText(symbols) } compare.factors <- function(eset,L) { eset.sub = eset[,eset$mol.biol %in% L] f <- factor(as.character(eset.sub$mol.biol)) design <- model.matrix(~f) fit <- eBayes(lmFit(eset.sub,design)) pv = fit$p.value[,2] sel <- p.adjust(pv) < 0.001 eset.sel <- eset.sub[sel,] } plot.eset <-function(eset,...) { patient.colors <- unlist(lapply( eset$mol.biol,color.map)) genenames = get.genenames(eset) heatmap(exprs(eset), col=topo.colors(100), ColSideColors=patient.colors, labRow=genenames) } |
Rcode.txt
.
Python code:
import rpy2.robjects as robjects from rpy2.robjects.packages import importr import bioc.biobase as biobase r = robjects.r g = robjects.globalenv gd = importr('grDevices') importr('ALL') r('data("ALL")') ALL = g['ALL'] eset = biobase.ExpressionSet(ALL) #-------------------------------------------- FH = open('Rcode.txt','r') s = FH.read() FH.close() r(s) compare_factors = r['compare.factors'] plot_eset = r['plot.eset'] #-------------------------------------------- targets = [ ['ALL1/AF4','NEG'], ['ALL1/AF4','BCR/ABL'], ['E2A/PBX1','BCR/ABL'] ] def get_ofn(t): #print t, len(t) u,v = t s = u + '_' + v ofn = s.replace('/','-') + '.pdf' return ofn def plot(eset_sub,t): ofn = get_ofn(t) gd.pdf(ofn) plot_eset(eset_sub,ofn) gd.dev_off() for t in targets: eset_sub = compare_factors(eset,t) plot(eset_sub,t) |
The first plot is at the top, and the second and third are below. It's pretty clear that there are significant commonalities according to mol.biol. BTW, I adjusted the target p-value to keep the number of hits down.
I also added code to substitute the gene names in the plots. That's pretty cool.
There are problems with the alternatives. That's for another time.