## Friday, February 5, 2010

### Comparing R and Python sequences

This is a post about elementary sequence operations in R and Python. It's as much for me as for you.

The most obvious difference between sequences in R and Python is Python's use of 0-based indexing:

R:
 `> 1:5[1] 1 2 3 4 5> A = seq(1,10,by=2)> A[1] 1 3 5 7 9> A[2][1] 3`

Python:
 `>>> range(1,6)[1, 2, 3, 4, 5]>>> A = range(1,11,2)>>> A[1, 3, 5, 7, 9]>>> A[1]3`

Another difference is that in R, but not in Python, one can assign to an index outside the initial range:

 `> m = 1:2> m[6] = 35> m[1] 1 2 NA NA NA 35`

 `>>> m = range(1,3)>>> m[6] = 35Traceback (most recent call last): File "", line 1, in IndexError: list assignment index out of range`

In R, but not in regular Python, we can make the increments non-integral values:

 `> A = seq(0,20,by=0.1)> A[1][1] 0> length(A)[1] 201> A[length(A)][1] 20`

We can use numpy to get around this restriction:

 `>>> import numpy as np>>> A = np.arange(0,20.1,0.1)>>> A[0]0.0>>> len(A)201>>> A[-1]20.0`

It's sometimes more convenient to specify how many numbers we want to obtain (evenly spaced in some interval):

 `> A = seq(0,2,length=6)> A[1] 0.0 0.4 0.8 1.2 1.6 2.0`

 `>>> A = np.linspace(0,2,6)>>> Aarray([ 0. , 0.4, 0.8, 1.2, 1.6, 2. ])`

Vectorized operations:

 `> m = 1:9> dim(m) = c(3,3)> m [,1] [,2] [,3][1,] 1 4 7[2,] 2 5 8[3,] 3 6 9> m = t(m)> m [,1] [,2] [,3][1,] 1 2 3[2,] 4 5 6[3,] 7 8 9> apply(m,1,mean)[1] 2 5 8> apply(m,2,mean)[1] 4 5 6> mean(m)[1] 5`

 `>>> m = np.arange(1,10)>>> m.shape = (3,3)>>> marray([[1, 2, 3], [4, 5, 6], [7, 8, 9]])>>> np.mean(m, axis=0)array([ 4., 5., 6.])>>> np.mean(m, axis=1)array([ 2., 5., 8.])>>> np.mean(m)5.0`

Here are some examples of fancy indexing where we rearrange rows and columns both at the same time:

 `> m [,1] [,2] [,3][1,] 1 2 3[2,] 4 5 6[3,] 7 8 9> m[c(2,3,1),c(3,2,1)] [,1] [,2] [,3][1,] 6 5 4[2,] 9 8 7[3,] 3 2 1`

The naive implementation in Python gives something different than in R (though useful), but the sequential approach works:

 `>>> marray([[1, 2, 3], [4, 5, 6], [7, 8, 9]])>>> m[[1,2,0],[2,1,0]]array([6, 8, 1])>>> m[[1,2,0],:][:,[2,1,0]]array([[6, 5, 4], [9, 8, 7], [3, 2, 1]])`

R has a few more indexing tricks for which I don't know if there is a Python equivalent:

 `> m = 1:9> dim(m) = c(3,3)> m = t(m)> m[-1,] [,1] [,2] [,3][1,] 4 5 6[2,] 7 8 9> sel = m[1,] > 2> sel[1] FALSE FALSE TRUE> m[sel][1] 7 8 9> y = -5:5> y [1] -5 -4 -3 -2 -1 0 1 2 3 4 5> y[y < 0] <- -y[y < 0]> y [1] 5 4 3 2 1 0 1 2 3 4 5> y <- abs(y)> y [1] 5 4 3 2 1 0 1 2 3 4 5`

But here, finally is an example we can do in both:

 `> m [,1] [,2] [,3][1,] 1 2 3[2,] 4 5 6[3,] 7 8 9> sel = array(c(1:3,3:1), dim=c(3,2))> sel [,1] [,2][1,] 1 3[2,] 2 2[3,] 3 1> m[sel] = 0> m [,1] [,2] [,3][1,] 1 2 0[2,] 4 0 6[3,] 0 8 9`

 `>>> marray([[1, 2, 3], [4, 5, 6], [7, 8, 9]])>>> m[[0,1,2],[2,1,0]] = 0>>> marray([[1, 2, 0], [4, 0, 6], [0, 8, 9]])`