I've been working on linear algebra again. My goal is to understand how eigenvalues and eigenvectors work, but there is also a beauty to the subject that I find fascinating. It's at once geometrical, but scales easily to n-dimensions. The basic facts are (so far) almost trivial geometrically, but writing vector and matrix equations is still tricky.
Although I posted about projections and least squares error almost a year ago (here and here), I had some trouble with the formulas and example for projections onto subspaces and least squares. So this is a recap.
The first example is 2D. You can refer to the figure from the book, above. We have two vectors a and b, where b points away from a, and we seek the projection p of b onto a:
(where x should really have a hat). It's a number which multiplies a to put it at exactly the correct position p so that
and the dot product (which Prof. Strang does as aT e) is zero:
For the 2D problem both terms are numbers, so we can divide in rearranging to give:
In the example we have b = (3,4,4) and a = (2,2,1) so aTb is just the dot product of a • b and
and
as required.
Now we add a second vector to the problem:
Then we have the plane formed by
and still, b (3,4,4) is not in the plane. We seek the projection of b onto this plane. Note that the error which is
as before, is perpendicular to every vector in the plane. (It is some z n where n is the normal vector).
So what Strang does is say that we have a matrix A:
Putting the x on the right side of A, we have:
p is some combination of the independent columns of A, where x is a vector now.
The critical insight is that we can combine the individual dot products
into a matrix multiplication:
Continuing:
We can't just divide like before, but instead we use the inverse:
Now, since we have x on the right of A, to get p we do:
We can re-group the terms as
A (ATA)-1 AT
and call this a projection matrix P that multiplies b to give p:There is a temptation to say:
and then everything in the previous equation cancels. But here's the thing!
A is not invertible! So we can't find A-1. And supposing A were invertible, and we could do that multiplication and then get the identity matrix. The projection of b into the column space of A would be equal to b. Multiplying by
I
would be exactly right!So, this is the crazy formula that we have to compute:
The first part is easy:
I like to put the second matrix above the result. Then it's easy to remember the rows and columns.
Use the 2 x 2 trick to get the inverse (switch a and d, multiply b and c by -1, and divide everything by the determinant); then continuing with A (ATA)-1
And finally, multiplying by AT to get P
Confirm this is the same answer we got from numpy (here).
Notice two key properties of P:
Now, multiply P times b (3,4,4) to get p:
And:
Confirm:
Another way of handling this does the steps in a different order:
We first get (ATA)-1 as before:
Then we do:
So
It's reassuring to get the same result.
Check out lecture 15 on MIT's ocw (here). It's a work of genius.