Thursday, August 18, 2011

Learning to use the RPy2 module (1)

Before we tackle turning the Bioconductor example into Python code, I need to review some basic usage of RPy2 as given in the docs. I posted a bit about this before, but my exploration wasn't systematic, and also the posts contain issues which I solved and described in later posts.

This exercises the first half of what's on the linked page. Python code followed output in bold.

import rpy2.robjects as robjects
from rpy2.robjects.packages import importr

r = robjects.r
g = robjects.globalenv
print type(r)

<class 'rpy2.robjects.R'>



# from the R base package
# any other name comes from .globalEnv
pi = r['pi']
print pi[0]
print type(pi)

3.14159265359
<class 'rpy2.robjects.vectors.FloatVector'>



# r is callable with code to be evaluated
piplus = r('piplus = pi + 1')
print type(piplus)
print piplus
print piplus[0]

<class 'rpy2.robjects.vectors.FloatVector'>
[1] 4.141593

4.14159265359
[1] 4.141593



# explicitly get from .globalEnv
piplus_from_g = g['piplus']
print piplus_from_g
print piplus == piplus_from_g
print piplus[0] == piplus_from_g[0]

False
True



# define an R function and call it
r('''
f <- function(start=1,stop = 5) {
n = 0
for (j in start:stop) { n = n + j }
print (n)
}
''')

# it's available in .globalEnv
f = g['f']
f()


[1] 15



# but also from the running R process
f = r['f']
f(4)

[1] 9



# interpolating an R object into code
letters = robjects.r['letters']
s = letters[1:6].r_repr()
rcode = 'paste(%s, collapse="-")' %(s)
res = robjects.r(rcode)
print(res)

[1] "b-c-d-e-f"



# more on calling R functions
rsum = r['sum']
print rsum(robjects.IntVector([1,2,3]))[0]

6



# with keyword
rsort = r['sort']
iv = robjects.IntVector([3,1,2])
res = rsort(iv, decreasing=True)
print res.r_repr()

c(3L, 2L, 1L)



# plotting
gd = importr('grDevices')
ofn = '/Users/telliott_admin/Desktop/plot.pdf'
gd.pdf(ofn)

x = robjects.IntVector(range(10))
y = r.rnorm(10)
r.layout(r.matrix(robjects.IntVector([1,2,3,2]), nrow=2, ncol=2))
r.plot(r.runif(10), y, xlab="runif", ylab="foo/bar", col="red")
gd.dev_off()



Finally, there are R's special operators like %in% and %*%. I didn't find this yet, so try wrapping it:

r('''
is_in <- function(value, container) {
value %in% container
}
''')

is_in = g['is_in']
L = robjects.StrVector(list('abcde'))
print is_in('a',L)
print is_in('f',L)

[1] TRUE

[1] FALSE





r('''m_mult <- function(m1,m0) { m1 %*% m0 }''')
m_mult = g['m_mult']
r('''mtoi <- function(m) { as.integer(m) }''')
mtoi = g['mtoi']

# an old friend
m0 = r.matrix(robjects.IntVector([0,1,1,1]), nrow=2)
m = m0
for i in range(10):
print i + 3,
m = m_mult(m,m0)
print mtoi(m)

3 [1] 1 1 1 2

4 [1] 1 2 2 3

5 [1] 2 3 3 5

6 [1] 3 5 5 8

7 [1] 5 8 8 13

8 [1] 8 13 13 21

9 [1] 13 21 21 34

10 [1] 21 34 34 55

11 [1] 34 55 55 89

12 [1] 55 89 89 144