While looking at the code in PyCogent I noticed a reference to "Cephes." The Cephes Mathematical Library (here)
It contains lots and lots of functions (like the t distribution and the incomplete beta and so on). The analogous functions in PyCogent appear to be just straight translations of the C code.
I found a Python module that makes the Cephes library available directly. It was put together by Michiel de Hoon (here, old home page here), when he was at the University of Tokyo. I've also looked at his clustering software before (here).
According to the last page, he's currently at Columbia University. Just two seconds ago, I saw this, some kind of manual he's written for doing statistics in Python. Whoaah! Looks like it should be very interesting!
Anyhow, I downloaded the source and did
The config file looks like this
/* config.h file created by setup.py script Fri Nov 5 15:55:52 2010
* run on darwin */
#define UNK 1
#define BIGENDIAN UNKNOWN /* replace with 0 or 1 */
I modified it appropriately and then did as the instructions say:
test by:
I figured the C version should be a lot faster, and it is:
But ... it doesn't seem to be the slow step in the t test code.
Calculating the cdf and testing the result
I looked up the definitions for the t distribution's pdf and cdf in wikipedia (here).
I translated the x-dependent part into Python and "vectorized" and used it to constuct a pdf. It's important to set really wide limits for the vector in this step, otherwise numerous small probability values are discarded, which then leads to small errors in the eventual product.
Since the normalization constants (gamma functions) depend on df but not on x, I was able to normalize the pdf simply by doing:
pdf /= sum(pdf)
I also used a very small step size in the Numpy array. The second section of the code listing ends by printing values from the distributions for whole numbers between -5 and 5:
As a test, I show that I get very similar values using
stdtr
from either PyCogent or Cephes:And that all matches what I get from R. So I'm pretty confident that I have the distributions correct and the functions are working as they should.
code listing: