Thursday, July 9, 2009

Eigenvalues, part 3

In this post, I use R to explore a simple PCA analysis of a set of observations in 2-dimensions. We start by grabbing some numbers in the form of an elongated shape:

x = rnorm(1000,0,1)
y = rnorm(1000,0,4)
M = cbind(x,y)
X = c(-12,12) # plot limits

plot(M,
xlim=X,ylim=X,main='M',
xlab='x',ylab='y')




To make things more interesting, we rotate the matrix of values by 45 degrees:

s = 1/sqrt(2)
T = matrix(c(s,s,-s,s),nrow=2)
T

[,1] [,2]
[1,] 0.7071068 -0.7071068
[2,] 0.7071068 0.7071068

M2 = M %*% T
plot(M2,
xlim=X,ylim=X,main='M2',
xlab='x',ylab='y')




Using the eigen and cov functions in R we find:

E = eigen(cov(M2))
E

$values
[1] 16.5603206 0.9731755

$vectors
[,1] [,2]
[1,] 0.7067687 -0.7074447
[2,] 0.7074447 0.7067687


If we use the E$vectors to rotate the matrix we analyzed (M2), we get:

M3 = M2 %*% E$vectors
plot(M3,
xlim=X,ylim=X,main='M3',
xlab='x',ylab='y')




So, I hope you can see what PCA does. It takes the ellipsoid shape of M2's points, and rotates them so that the maximum variance is along the x axis.

The first eigenvector of the covariance matrix constructed from M2 identifies the direction in which the variance is the maximum, and the first eigenvalue (E$values[1]) is this variance.

Finally, we use R's implementation of principal component analysis:


princomp(M2)

Call:
princomp(x = M2)

Standard deviations:
Comp.1 Comp.2
4.145496 1.005096

2 variables and 1000 observations.


Notice that the standard deviations match what we started with.