## Wednesday, August 12, 2009

### Student's t test

This post is a simple demo of using R to carry out Student's t test.

Let's look at a population of values with a normal distribution, mean = 5 and standard deviation = 1.

 `set.seed(157)v = rnorm(10000,5,1)`

We draw 4 samples without replacement:

 `e1 = sample(v,4)`

 `> round(e1,2)[1] 5.34 4.80 5.56 4.96`

A t test for one sample tests the null hypothesis that the mean μ for the population from which the sample is drawn is equal to μ0. For example it could be that we have many observations of untreated cells (from which we get μ0), and now we wish to estimate whether the mean values of treated cells are detectably different.

 `result = t.test(e1,mu=6)`

The argument alternative = 'two.sided' is the default, so we don't need to specify it here.

 `> result One Sample t-testdata: e1 t = -4.7613, df = 3, p-value = 0.01759alternative hypothesis: true mean is not equal to 6 95 percent confidence interval: 4.608507 5.723439 sample estimates:mean of x 5.165973`

Even with only four samples and a difference in means of (6-5) / 6 the result of the t test tells us that we can reject the null hypothesis that μ = μ0 = 6, with p=0.018.

Now, it might have been the case that before we saw the data (and that proviso is crucial), we expected from the nature of the treatment that the mean of treated population would be less than the untreated population. In that case, we would be justified in specifying a one-sided test:

 `result = t.test(e1,mu=6,alternative='less')`

We note the p-value is:

 `p-value = 0.008796`

In reality, because of biological variation (as well as unintended variation in experiment conditions) we would always include a control group for such an experiment.

 `w = rnorm(10000,6,1)e2 = sample(w,4)result = t.test(e1,e2,alternative='less')`

 `> result Welch Two Sample t-testdata: e1 and e2 t = -1.3385, df = 3.416, p-value = 0.1314alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf 0.6194634 sample estimates:mean of x mean of y 5.165973 6.084764 `

Now it is much more difficult to see a result with significance. If there 25 samples in the control group we can see a difference:

 `result = t.test(e1, rnorm(25,6,1),alternative='less')`

 `> result Welch Two Sample t-testdata: e1 and rnorm(25, 6, 1) t = -3.4116, df = 11.547, p-value = 0.002717alternative hypothesis: true difference in means is less than 0 95 percent confidence interval: -Inf -0.4127716 sample estimates:mean of x mean of y 5.165973 6.033374`