## Friday, March 12, 2010

### Selecting array elements in R and numpy

I was a bit confused about the behavior of vectorize in numpy (last post), so I posted a question to SO. Of course, they knew the answer immediately, as I described.

But there is one more thing that came up, and that has to do with the selection of elements from an array. I wasn't aware that you can do indexing by a test that returns a boolean vector in numpy, but you can. In R, it's common to see something like this. Here is a matrix m. We can leave a row out by "-":

 `> m = 1:9> dim(m) = c(3,3)> m = t(m)> m [,1] [,2] [,3][1,] 1 2 3[1,] 4 5 6[2,] 7 8 9> m[-1,] [,1] [,2] [,3][1,] 4 5 6[2,] 7 8 9`

The test m[,1] > 2 asks for all rows in which the first column is greater than 2:

 `> m[,1][1] 1 4 7> sel = m[,1] > 2> sel[1] FALSE TRUE TRUE`

Note: I screwed this up in the original example. We use the selector to get those rows:

 `> m[sel,] [,1] [,2] [,3][1,] 4 5 6[2,] 7 8 9`

How would you do this in Python with numpy?

 `import numpy as npA = np.arange(1,10)A.shape = (3,3)print A`

 `[[1 2 3][4 5 6][7 8 9]]`

 `B = A[A[:,0]>2]print B`

 `[[4 5 6][7 8 9]]`

 `sel = A[:,0]>2print sel`

 `[False True True]`

 `print A[sel,:]`

 `[[4 5 6][7 8 9]]`

It works!