I've been working on linear algebra again. My goal is to understand how eigenvalues and eigenvectors work, but there is also a beauty to the subject that I find fascinating. It's at once geometrical, but scales easily to n-dimensions. The basic facts are (so far) almost trivial geometrically, but writing vector and matrix equations is still tricky.

Although I posted about projections and least squares error almost a year ago (here and here), I had some trouble with the formulas and example for projections onto subspaces and least squares. So this is a recap.

The first example is 2D. You can refer to the figure from the book, above. We have two vectors a and b, where b points away from a, and we seek the projection p of b onto a:

(where x should really have a hat). It's a number which multiplies a to put it at exactly the correct position p so that

and the dot product (which Prof. Strang does as a

^{T}e) is zero:

For the 2D problem both terms are numbers, so we can divide in rearranging to give:

In the example we have b = (3,4,4) and a = (2,2,1) so a

^{T}b is just the dot product of a • b and

and

as required.

Now we add a second vector to the problem:

Then we have the plane formed by

and still, b (3,4,4) is not in the plane. We seek the projection of b onto this plane. Note that the error which is

as before, is perpendicular to every vector in the plane. (It is some z n where n is the normal vector).

So what Strang does is say that we have a matrix A:

Putting the x on the right side of A, we have:

p is some combination of the independent columns of A, where x is a vector now.

The critical insight is that we can combine the individual dot products

into a matrix multiplication:

Continuing:

We can't just divide like before, but instead we use the inverse:

Now, since we have x on the right of A, to get p we do:

We can re-group the terms as

`A (A`^{T}A)^{-1} A^{T}

and call this a projection matrix P that multiplies b to give p:There is a temptation to say:

and then everything in the previous equation cancels. But here's the thing!

A is not invertible! So we can't find A

^{-1}. And supposing A were invertible, and we could do that multiplication and then get the identity matrix. The projection of b into the column space of A would be equal to b. Multiplying by

`I`

would be exactly right!So, this is the crazy formula that we have to compute:

The first part is easy:

I like to put the second matrix above the result. Then it's easy to remember the rows and columns.

Use the 2 x 2 trick to get the inverse (switch a and d, multiply b and c by -1, and divide everything by the determinant); then continuing with A (A

^{T}A)

^{-1}

And finally, multiplying by A

^{T}to get P

Confirm this is the same answer we got from numpy (here).

Notice two key properties of P:

Now, multiply P times b (3,4,4) to get p:

And:

Confirm:

Another way of handling this does the steps in a different order:

We first get (A

^{T}A)

^{-1}as before:

Then we do:

So

It's reassuring to get the same result.

Check out lecture 15 on MIT's ocw (here). It's a work of genius.