Thursday, February 10, 2011

Strang's Linear Algebra: homework (recap)



I've been working on linear algebra again. My goal is to understand how eigenvalues and eigenvectors work, but there is also a beauty to the subject that I find fascinating. It's at once geometrical, but scales easily to n-dimensions. The basic facts are (so far) almost trivial geometrically, but writing vector and matrix equations is still tricky.

Although I posted about projections and least squares error almost a year ago (here and here), I had some trouble with the formulas and example for projections onto subspaces and least squares. So this is a recap.

The first example is 2D. You can refer to the figure from the book, above. We have two vectors a and b, where b points away from a, and we seek the projection p of b onto a:


     p = xa 


(where x should really have a hat). It's a number which multiplies a to put it at exactly the correct position p so that


     p + e = b
e = b - p

e is perpendicular to a (and p)


and the dot product (which Prof. Strang does as aT e) is zero:


     aT e = 0 = aT (b - p) = aT(b - xa).  


For the 2D problem both terms are numbers, so we can divide in rearranging to give:


     x =  aTb / aTa

p = xa = (aTb / aTa) a


In the example we have b = (3,4,4) and a = (2,2,1) so aTb is just the dot product of a • b and


     x = (2,2,1) • (3,4,4) / (2,2,1) • (2,2,1) = 18/9 = 2

p = xa = 2 (2,2,1) = (4,4,2)

e = b - p = (3,4,4) - (4,4,2) = (-1,0,2)


and


     e • a = (-1,0,2) • (2,2,1) = 0 


as required.

Now we add a second vector to the problem:


     a* = (1,0,0).  


Then we have the plane formed by


     a = (2,2,1) and a* = (1,0,0) 


and still, b (3,4,4) is not in the plane. We seek the projection of b onto this plane. Note that the error which is


     e = b - p


as before, is perpendicular to every vector in the plane. (It is some z n where n is the normal vector).

So what Strang does is say that we have a matrix A:


     A = [ a  a* ] = [ 2  1 ]
[ 2 0 ]
[ 1 0 ]


Putting the x on the right side of A, we have:


     p = Ax
e = b - p


p is some combination of the independent columns of A, where x is a vector now.

The critical insight is that we can combine the individual dot products


     a  • e = 0
a* • e = 0


into a matrix multiplication:


     ATe = 0 = ATb - ATp = ATb - ATAx


Continuing:


     ATb = ATAx


We can't just divide like before, but instead we use the inverse:


     x = (ATA)-1 ATb


Now, since we have x on the right of A, to get p we do:


     p = Ax = A (ATA)-1 AT b


We can re-group the terms as A (ATA)-1 AT and call this a projection matrix P that multiplies b to give p:


     A (ATA)-1 AT 


There is a temptation to say:


     (ATA)-1 = A-1 (AT)-1


and then everything in the previous equation cancels. But here's the thing!


     A = [ a  a* ] = [ 2  1 ]
[ 2 0 ]
[ 1 0 ]


A is not invertible! So we can't find A-1. And supposing A were invertible, and we could do that multiplication and then get the identity matrix. The projection of b into the column space of A would be equal to b. Multiplying by I would be exactly right!

So, this is the crazy formula that we have to compute:


     p = Ax = A (ATA)-1 AT b


The first part is easy:


AT A =

[ 2 1 ]
[ 2 0 ]
[ 1 0 ]

[ 2 2 1 ] [ 9 2 ]
[ 1 0 0 ] [ 2 1 ]


I like to put the second matrix above the result. Then it's easy to remember the rows and columns.

Use the 2 x 2 trick to get the inverse (switch a and d, multiply b and c by -1, and divide everything by the determinant); then continuing with A (ATA)-1


              [  1/5  -2/5 ]
[ -2/5 9/5 ]

[ 2 1 ] [ 0 1 ]
[ 2 0 ] [ 2/5 -4/5 ]
[ 1 0 ] [ 1/5 -2/5 ]


And finally, multiplying by AT to get P


                      [  2   2    1  ]
[ 1 0 0 ]

[ 0 1 ] [ 1 0 0 ]
[ 2/5 -4/5 ] [ 0 4/5 2/5 ]
[ 1/5 -2/5 ] [ 0 2/5 1/5 ]


Confirm this is the same answer we got from numpy (here).

Notice two key properties of P:


PT = P2 = P



                        [  1   0    0  ]
[ 0 4/5 2/5 ]
[ 0 2/5 1/5 ]

[ 1 0 0 ] [ 1 0 0 ]
[ 0 4/5 2/5 ] [ 0 4/5 2/5 ]
[ 0 2/5 1/5 ] [ 0 2/5 1/5 ]


Now, multiply P times b (3,4,4) to get p:


                        [ 3 ]
[ 4 ]
[ 4 ]

[ 1 0 0 ] [ 3 ]
[ 0 4/5 2/5 ] [ 24/5 ]
[ 0 2/5 1/5 ] [ 12/5 ]


And:


     e = b - p = (3,4,4) - (3,24/5,12/5)
= (0,-4/5,8/5)


Confirm:


     e • a  = (0,-4/5,8/5) • (2,2,1) = 0
e • a* = (0,-4/5,8/5) • (1,0,0) = 0


Another way of handling this does the steps in a different order:


     p = Ax = A (ATA)-1 AT b


We first get (ATA)-1 as before:


AT A =  [ 9 2 ]   
[ 2 1 ]

(ATA)-1 = 1/5 [ 1 -2 ]
[ -2 9 ]


Then we do:


AT b =

[ 3 ]
[ 4 ]
[ 4 ]

[ 2 2 1 ] [ 18 ]
[ 1 0 0 ] [ 3 ]


So


     x = (ATA)-1 AT b

[ 18 ]
[ 3 ]

1/5 [ 1 -2 ] [ 12/5 ]
[ -2 9 ] [ -9/5 ]


p = Ax

[ 12/5 ]
[ -9/5 ]

[ 2 1 ] [ 3 ]
[ 2 0 ] [ 24/5 ]
[ 1 0 ] [ 12/5 ]

e = b - p = (3,4,4) - (3,24/5,12/5)
= (0,-4/5,8/5)


It's reassuring to get the same result.

Check out lecture 15 on MIT's ocw (here). It's a work of genius.