## Saturday, October 30, 2010

I've learned that

"for the sum or difference of independent random variables, variances add"

(source)

Here is a Python simulation with numbers drawn from a normal distribution with mean = 0 and std = 2. As expected, the actual variance observed for a sample is not exactly equal to 4, though it approaches this as N increases. What was a surprise to me is that the statement "variances add" is only approximately true for actual data. (That is, unless I've screwed up somehow). Looking back, I saw this before but ignored it.

Here are trials for various N, showing the mean, variance and standard deviations of the samples, the sums of the actual variances, and the variances of the sum and difference.

[UPDATE: I handled the printout of N's value poorly. For example, in the interpreter:
 `>>> 10e21000.0`

for N = 100, when what I actually did in code was `10**2`. ]

 ` mean var stdN = 10e2 -0.009 3.957 1.989 data1 0.339 3.192 1.787 data2 0.330 7.149 3.776 totaladd 0.330 7.101 2.665 combosub -0.348 7.197 2.683 combo------------------------------N = 10e3 0.066 3.727 1.931 data1 0.009 4.027 2.007 data2 0.076 7.754 3.937 totaladd 0.076 7.753 2.784 combosub 0.057 7.755 2.785 combo------------------------------N = 10e4 -0.025 4.007 2.002 data1 0.041 4.089 2.022 data2 0.015 8.097 4.024 totaladd 0.015 8.087 2.844 combosub -0.066 8.106 2.847 combo------------------------------N = 10e7 0.000 4.001 2.000 data1 0.001 3.999 2.000 data2 0.001 8.000 4.000 totaladd 0.001 8.001 2.829 combosub -0.000 7.998 2.828 combo------------------------------`

 `import numpy as npdef show(stats,name=None): L = [' %3.3f ' % e for e in stats] print ''.join([e.rjust(10) for e in L]), if name: print name else: print def get_stats(data): funcs = [np.mean,np.var,np.std] return np.array([f(data) for f in funcs])def init(N): total = np.zeros(3) f = np.random.normal rL = list() for i in range(2): rL.append(f(loc=0,scale=2,size=N)) stats = get_stats(rL[-1]) show(stats,'data'+str(i+1)) total += stats show(total,'total') return rL def add(A1,A2): return A1 + A2def sub(A1,A2): return A1 - A2#---------------------------------------for e in [2,3,4,7]: N = 10**e print 'N = 10e' + str(e) data1, data2 = init(N) for f in [add,sub]: print f.func_name combo = f(data1,data2) show(get_stats(combo),'combo') print '-'*30`