## Friday, November 5, 2010

### One Sample t-test

This is the fourth in a series of posts about Student's t test. The previous posts are here , here and here.

There are several different versions of the t-test, but the simplest is the one sample test. We will have a single sample with a (relatively) small number of values. We calculate the mean and the variance and then the standard deviation of the sample values. Importantly, the variance and standard deviation are the unbiased versions, in which the sum of squares is divided by n-1 rather than n.

It's easy to see that we should not use numpy's var for this test:

 `>>> import numpy as np>>> A = [1,2,3]>>> np.var(A)0.66666666666666663`

As `help(np.var)` indicates:

 ` Notes ----- The variance is the average of the squared deviations from the mean, i.e., var = mean(abs(x - x.mean())**2). The computed variance is biased, i.e., the mean is computed by dividing by the number of elements, N, rather than by N-1.`

In the code below we do:

 ` z = np.mean(A) - mu z /= unbiased_std(A)`

So, at least conceptually, at this point we have a z-score for the sample mean. With a standard z-score, we would compare it to the normal distribution. Here we do two things differently: we multiply by √n (the square root of the number of samples), and we find the value of the resulting t statistic in the t distribution.

 `t-statistic p-value-2.378 0.049`

The result indicates that, given a prior choice of a limit of 0.05 (for this one-sided test), the null hypothesis that the mean of the sample values is equal to 3 is not supported (just barely).

You can read much more about the background of the test here.

The second set of values in the code below gives:

 `t-statistic p-value-6.455 0.004`

It's reassuring that a One Sample t-test in R gives a similar result. I'll have more to say about "tails" and sidedness in another post.

 `> A = c(3.1,2.3,2.1,1.7)> t.test(A, alternative='less',mu=3) One Sample t-testdata: A t = -2.3778, df = 3, p-value = 0.04891alternative hypothesis: true mean is less than 3 95 percent confidence interval: -Inf 2.992811 sample estimates:mean of x 2.3 > B = c(2.1,1.8,2.7,2.4)> t.test(B, alternative='less',mu=3.5) One Sample t-testdata: B t = -6.455, df = 3, p-value = 0.003771alternative hypothesis: true mean is less than 3.5 95 percent confidence interval: -Inf 2.705727 sample estimates:mean of x 2.25 > `

Obviously, we could use a lot more testing.

code listing:

 `from __future__ import divisionimport numpy as npfrom transcendental import stdtrfrom utils import unbiased_stddef one_sample_t(A,mu): n = len(A) df = n - 1 z = np.mean(A) - mu z /= unbiased_std(A) t = z * np.sqrt(n) return t, stdtr(df,t)def test(A,mu): result = one_sample_t(A,mu) print 't-statistic p-value' print '%5.3f %5.3f' % result if __name__ == '__main__': A = np.array([3.1,2.3,2.1,1.7]) B = np.array([2.1,1.8,2.7,2.4]) test(A,mu=3) test(B,mu=3.5)'''R code:A = c(3.1,2.3,2.1,1.7)t.test(A, alternative='less',mu=3)B = c(2.1,1.8,2.7,2.4)t.test(B, alternative='less',mu=3.5)'''`

`utils.py`

 `from __future__ import divisionimport numpy as npdef unbiased_var(X): n = len(X) sample_SS = sum(X**2) - sum(X)**2 / n return sample_SS/ (n-1) def unbiased_std(X): return np.sqrt(unbiased_var(X))`