Here is the result of my experiment with the rpy2-bioconductor-extensions-0.2.
import rpy2.robjects as robjects
from rpy2.robjects.packages import importr
import bioc.biobase as biobase
r = robjects.r
g = robjects.globalenv
importr('ALL')
r('data("ALL")')
data = g['ALL']
eset = biobase.ExpressionSet(data)
print eset
Run as a script.
> python script.py
First (above) section output:
ExpressionSet (storageMode: lockedEnvironment)
assayData: 12625 features, 128 samples
element names: exprs
protocolData: none
phenoData
sampleNames: 01005 01010 ... LAL4 (128 total)
varLabels: cod diagnosis ... date last seen (21 total)
varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
pubMedIds: 14684422 16243790
Annotation: hgu95av2
Part 2:
expr_data = eset.exprs
print len(expr_data)
L = tuple(expr_data[:5])
print [round(t,3) for t in L]
output:
1616000
[7.597, 5.046, 3.9, 5.904, 5.925]
Part 3:
print list(eset.slotnames())
phenoData = eset.do_slot('phenoData')
print 'phenoData', id(phenoData)
print phenoData
output:
['experimentData', 'assayData', 'phenoData', 'featureData', 'annotation', 'protocolData', '.__classVersion__']
phenoData 0x10bbc35a8
An object of class "AnnotatedDataFrame"
sampleNames: 01005 01010 ... LAL4 (128 total)
varLabels: cod diagnosis ... date last seen (21 total)
varMetadata: labelDescription
Part 4:
pd = eset.get_phenodata()
print 'pd', id(pd)
print pd
print list(pd.slotnames())
print
output:
pd 0x10bbc3488
An object of class "AnnotatedDataFrame"
sampleNames: 01005 01010 ... LAL4 (128 total)
varLabels: cod diagnosis ... date last seen (21 total)
varMetadata: labelDescription
['varMetadata', 'data', 'dimLabels', '.__classVersion__']
Part 5:
data = pd.do_slot('data')
print 'data'
print type(data)
mb = pd.pdata.rx2('mol.biol')
print list(mb)[:10]
levels = r['levels']
print levels(mb)
print list(data.rx2('mol.biol'))[:10]
output:
data
<class 'rpy2.robjects.vectors.DataFrame'>
[2, 4, 2, 1, 4, 4, 4, 4, 4, 2]
[1] "ALL1/AF4" "BCR/ABL" "E2A/PBX1" "NEG" "NUP-98" "p15/p16"
[2, 4, 2, 1, 4, 4, 4, 4, 4, 2]
Part 6:
fd = eset.get_featuredata
print fd
# fd() gives AttributeError, no _featureData
#
featureNames = robjects.r['featureNames']
fn = featureNames(eset)
print len(fn), fn[0]
output:
<bound method ExpressionSet.get_featuredata of >
12625 1000_at
In the last part, the "new" way doesn't seem to work yet. So I did as before.
Well, that's it for Bioconductor and Rpy2 for now. The presentation could be a lot better organized, but at least we figured out how to do most things.