PyCogent uses Numpy extensively, and I'm not really familiar with that either, so I've been fiddling with it today. A goal for the future would be to explore how one would use Python (and Numpy and Scipy) for processing microarray data. I know that Bioconductor has become standard---in fact I gave a talk about it. But R is ... ugh.

So, the first thing you want to do with the data from a microarray experiment is to

*put it into an array*:), and then, likely, normalize the data. And that leads to the question of how to deal with missing values---nan (not a number). The problem is that if you run numpy.mean or numpy.std on a 1D array containing an nan value, you get back nan.

I couldn't find a function to do this, so I decided to roll my own for fun. It doesn't have to be super efficient because you only do this once or twice, and Python is already plenty fast enough. And please, if anybody knows how to do this better, let me know.

At the bottom of the post is the code. This is the output:

In the two blocks above, we print two randomly chosen rows, with the mean and stdev. Then we manually calculate the z-score, and compare it to the output from the code. In the second part of the output, we find the first two rows that contain NaN's and check to see that they are handled correctly.

## 1 comment:

numpy has a module dealing with masked arrays made for this, numpy.ma. you can mask invalid values, such as np.nan, and then use the numpy.ma functions, or numpy.ma.compressed to return an array with masked values removed.

Post a Comment