[UPDATE: I'm currently (November. 2010) in the middle of a series of posts about Student's

*t*test. The first six posts are here, here, here, here, here, and here.]

I'd like to do my teaching in Python as much as possible. In keeping with that, I plan to explore methods for analyzing microarray data using Python rather than R and Bioconductor. As a preliminary matter, I'm going to use Student's

*t-test*(two sample) as implemented in PyCogent. A simple-minded approach would be to a situation where the samples have already been classified, would be to filter them by the t-test statistic.

Here is a simple test. We draw a bunch of numbers from the normal distribution and keep them in a numpy array. We slide a window along the array, pulling out two small sets of n numbers at a time, and do a two sample t-test on them. We record the p-value. We expect that since all the numbers come from the same normal distribution the distribution of the p-value will be such that, on the average, p < 0.05 about 5% of the time.

Now, we run the trial a bunch of times. For this run, I've chosen the sample size to be 3.

Output:

## 1 comment:

Thanks, this was very helpful both for understanding the t-test and implementing it on data!

Post a Comment