## Thursday, July 9, 2009

### Eigenvalues, part 3

In this post, I use R to explore a simple PCA analysis of a set of observations in 2-dimensions. We start by grabbing some numbers in the form of an elongated shape:

 `x = rnorm(1000,0,1)y = rnorm(1000,0,4)M = cbind(x,y)X = c(-12,12) # plot limitsplot(M, xlim=X,ylim=X,main='M', xlab='x',ylab='y')`

To make things more interesting, we rotate the matrix of values by 45 degrees:

 `s = 1/sqrt(2)T = matrix(c(s,s,-s,s),nrow=2)T [,1] [,2][1,] 0.7071068 -0.7071068[2,] 0.7071068 0.7071068M2 = M %*% Tplot(M2, xlim=X,ylim=X,main='M2', xlab='x',ylab='y')`

Using the eigen and cov functions in R we find:

 `E = eigen(cov(M2))E\$values[1] 16.5603206 0.9731755\$vectors [,1] [,2][1,] 0.7067687 -0.7074447[2,] 0.7074447 0.7067687`

If we use the E\$vectors to rotate the matrix we analyzed (M2), we get:

 `M3 = M2 %*% E\$vectorsplot(M3, xlim=X,ylim=X,main='M3', xlab='x',ylab='y')`

So, I hope you can see what PCA does. It takes the ellipsoid shape of M2's points, and rotates them so that the maximum variance is along the x axis.

The first eigenvector of the covariance matrix constructed from M2 identifies the direction in which the variance is the maximum, and the first eigenvalue (E\$values[1]) is this variance.

Finally, we use R's implementation of principal component analysis:

 `princomp(M2)Call:princomp(x = M2)Standard deviations: Comp.1 Comp.2 4.145496 1.005096 2 variables and 1000 observations.`

Notice that the standard deviations match what we started with.