We can turn this into an illustration of statistical significance. How high must a score be before one can dismiss the null hypothesis (no knowledge of the subject) with a p-value of 0.05?
We have a Bernouli trial with p = 0.2.
The expected value or mean is np. There is a beautifully simple derivation of the mean and variance in the wikipedia article, the variance is np(1-p). For our example, the mean is 20 and the variance is 100 x 0.2 x 0.8, so the standard deviation is √16 = 4.
For a large class (N = 100) of randomly answering students, the distribution appraches normal, and we expect that 2.5% of such students will score at or above two standard deviations above the mean. To make our rate for a type I error less than 0.05, we should require a score of 28 or higher.
I wrote a short script to simulate this. Each student takes a test with 100 questions, and each trial consists of a class of 100 students. We count the number of students with scores at or above a threshold T. The results for 10 trials are printed, and the results for 100 trials are averaged.
A few quick points: the mean appears to fall between 20 and 21. I'm sure there's a simple explanation but it eludes me at the moment. The score that 2.5% of students regularly exceed is 28.
This is a great example of the danger of multiple hypothesis testing!
One last thing is that the code is not very smart. What I should have done was extract the statistics for each value of T for each trial, rather than repeating the trial for each value of T. But it runs fast enough, and this way I don't need a data structure to remember the results.
> python script.py 20: 56 43 54 57 57 54 57 46 45 64 55.0 21: 39 48 42 44 39 44 48 48 50 48 43.5 22: 43 32 39 30 38 35 36 42 33 41 34.6 23: 40 29 35 31 27 19 30 20 31 25 26.0 24: 19 18 24 26 20 22 21 17 15 24 19.2 25: 11 14 8 8 13 11 12 15 17 13 12.9 26: 12 6 6 6 11 4 8 7 10 6 9.2 27: 7 4 6 5 8 7 5 7 7 7 5.9 28: 3 2 4 2 3 1 2 3 5 1 3.6 29: 4 4 2 2 2 1 3 1 0 1 2.2 30: 1 1 1 0 1 1 1 0 0 1 0.9 31: 4 0 1 0 0 0 0 0 0 0 0.6 32: 0 0 0 0 1 0 0 1 0 0 0.3 33: 0 0 0 0 0 1 0 0 0 1 0.1 34: 0 0 0 0 0 0 0 0 1 0 0.1 |
script.py
import random def student(N=100, p=0.2): L = [random.random() <= p for i in range(N)] return sum(L) def trial(T): L = [student() for i in range(100)] return sum([1 for n in L if n >= T]) def mean(L): return 1.0*sum(L)/len(L) for T in range(20,35): L = list() print '%2d: ' % T, for i in range(100): L.append(trial(T)) if i < 10: print '%2d' % L[-1], print '%4.1f' % mean(L) |