Thursday, August 13, 2009

Student's t test 2

In this post, I want to work through a manual calculation of the t test.

set.seed(157)
v = rnorm(10000,5,1)
e = sample(v,9)
round(e,2)
mean(e)
sd(e)


> round(e,2)
[1] 5.34 4.80 5.56 4.96 4.73
[6] 3.24 4.86 5.14 5.66
> mean(e)
[1] 4.923185
> sd(e)
[1] 0.7122304


Remember that for sd (and var), R uses the sample standard deviation with its correction of n / n-1.

From these values, we can get the standard error of the mean, sem = sd(e)/ √n. Since n = 9, we calculate sem = 0.71/3 = 0.2374

If the sample size were not an issue, we would calculate a confidence interval for a two-sided test of:

> x = 1.96 * 0.2374
> x
[1] 0.465304
> 4.923 - x
[1] 4.457696
> 4.923 + x
[1] 5.388304


And for the one-sided test, since we specified that the null hypothesis has μ >= 6, we multiply by the value (from the last post) using p = 95 %:

> x = 1.64 * 0.2374
> x
[1] 0.389336
> 4.923 + x
[1] 5.312336


The small sample correction for the t test means that instead of using values from the standard normal distribution in the calculation above, we get them from the t-distribution, in this case for df (degrees of freedom) = n-1 = 9.

So, instead of using 1.96 (or 1.64) as the multiplier, we use 2.26 (two-sided test, df = 8).

> x = 2.30*0.2374
> x
[1] 0.54602
> 4.923 - x
[1] 4.37698
> 4.923 + x
[1] 5.46902


or, for a one-sided test we multiply by the 95 % level of the t-distribution with df = 8 (1.85):

> x = 1.85*0.2374
> x
[1] 0.43919
> 4.923 + x
[1] 5.36219


Since 5.36 is definitely < 6, we can say the result is significant at p = 0.05.

R gives this. Notice they find the p-value at which the distribution becomes > 6. I'm not sure this is kosher, but you see it a lot:

v = c(5.34,4.80,5.56,4.96,4.73, 
3.24,4.86,5.14,5.66)

t.test(v,alt='less',mu=6)


> t.test(v,alt='less',mu=6)

One Sample t-test

data: v
t = -4.5433, df = 8, p-value = 0.0009454
alternative hypothesis: true mean is less than 6
95 percent confidence interval:
-Inf 5.362691
sample estimates:
mean of x
4.921111