Sunday, July 26, 2009

Statistical doodling: variance

In Bolstad, Chapter 5, there is a proof of the following statement about the variance of independent random variables X and Y.

Var(X + Y) = Var(X) + Var(Y)


There is a lot more discussion here. The post calls this the "Pythagorean Theorem of Statistics", since an equivalent formulation is:

SD2(X + Y) = SD2(X) + SD2(Y)


I don't want to detail the proof, but I did fool around a bit in R to explore this:

set.seed(1357)
u = rnorm(10000,5,1)
var(u)
var(u + 7)
var(u-250)
var(3*u)
var(u/5)


Here is what it prints:

> var(u)
[1] 0.9962947
> var(u + 7)
[1] 0.9962947
> var(u-250)
[1] 0.9962947
> var(3*u)
[1] 8.966652
> var(u/5)
[1] 0.03985179


So, if we add or subtract a constant C, the variance is unchanged. But if we multiply by C, the variance is multiplied by C2; and if we divide by C, the variance is divided by C2.

Now consider a second set of numbers from rnorm. The first vector has a mean of 5 and sd of 2 (variance of 4), while the second has a mean of 4 and sd of 3 (variance of 9).

u = rnorm(1000,5,2)
v = rnorm(1000,4,3)
var(u+v)
var(u-v)


> u = rnorm(1000,5,2)
> v = rnorm(1000,4,3)
> var(u)
[1] 3.99337
> var(v)
[1] 9.470766
> var(u+v)
[1] 12.76349
> var(u-v)
[1] 13.65514


Our simulation confirms the rule that the variances add.

And finally, look at multiplication:

u = rnorm(1000,0,1)
v = rnorm(1000,0,1)
var(u*v)
var((u+1)*v)
var((u+2)*v)
var((u+3)*v)
var((u+2)*(v+2))


The variance depends on the mean of the distributions. Here, the variances of u and v (as well as u + 1,2..3) are always 1. For means of:

0,0:  var =  1.0
1,0: var = 2.2
2,0: var = 5.5
3,0: var = 10.9
2,2: var = 9.4


I found an expression here:

Var(XY) = Var(X)*Var(Y) + Var(X)*E[Y]^2 + E[X]^2 Var(Y)


That is:

v(X*Y) = vX*vY + vX*mY2 + mX2*vY


In the cases above (variance is unchanged and equal to 1) we have:

0,0:  v(X*Y) = 1 + 1*0  + 0 *1 =  1
1,0: v(X*Y) = 1 + 1*1 + 0 *1 = 2
2,0: v(X*Y) = 1 + 1*22 + 0 *1 = 5
3,0: v(X*Y) = 1 + 1*32 + 0 *1 = 10
2,2: v(X*Y) = 1 + 1*22 + 22*1 = 9


Looks correct.