Python for Bioinformatics: Strang's Linear Algebra: homework (recap)

I've been working on linear algebra again. My goal is to understand how eigenvalues and eigenvectors work, but there is also a beauty to the subject that I find fascinating. It's at once geometrical, but scales easily to n-dimensions. The basic facts are (so far) almost trivial geometrically, but writing vector and matrix equations is still tricky.

Although I posted about projections and least squares error almost a year ago (here and here), I had some trouble with the formulas and example for projections onto subspaces and least squares. So this is a recap.

The first example is 2D. You can refer to the figure from the book, above. We have two vectors a and b, where b points away from a, and we seek the projection p of b onto a:

     p = xa

(where x should really have a hat). It's a number which multiplies a to put it at exactly the correct position p so that

     p + e = b
     e = b - p

     e is perpendicular to a (and p)

and the dot product (which Prof. Strang does as a^T e) is zero:

     a^T e = 0 = a^T (b - p) = a^T(b - xa).

For the 2D problem both terms are numbers, so we can divide in rearranging to give:

     x =  a^Tb / a^Ta

     p = xa = (a^Tb / a^Ta) a

In the example we have b = (3,4,4) and a = (2,2,1) so a^Tb is just the dot product of a • b and

     x = (2,2,1) • (3,4,4) / (2,2,1) • (2,2,1) = 18/9 = 2

     p = xa = 2 (2,2,1) = (4,4,2)

     e = b - p = (3,4,4) - (4,4,2) = (-1,0,2)

and

     e • a = (-1,0,2) • (2,2,1) = 0

as required.

Now we add a second vector to the problem:

     a* = (1,0,0).

Then we have the plane formed by

     a = (2,2,1) and a* = (1,0,0)

and still, b (3,4,4) is not in the plane. We seek the projection of b onto this plane. Note that the error which is

     e = b - p

as before, is perpendicular to every vector in the plane. (It is some z n where n is the normal vector).

So what Strang does is say that we have a matrix A:

     A = [ a  a* ] = [ 2  1 ]
                     [ 2  0 ]
                     [ 1  0 ]

Putting the x on the right side of A, we have:

     p = Ax
     e = b - p

p is some combination of the independent columns of A, where x is a vector now.

The critical insight is that we can combine the individual dot products

     a  • e = 0
     a* • e = 0

into a matrix multiplication:

     A^Te = 0 = A^Tb - A^Tp = A^Tb - A^TAx

Continuing:

     A^Tb = A^TAx

We can't just divide like before, but instead we use the inverse:

     x = (A^TA)^-1 A^Tb

Now, since we have x on the right of A, to get p we do:

     p = Ax = A (A^TA)^-1 A^T b

We can re-group the terms as A (A^TA)^-1 A^T and call this a projection matrix P that multiplies b to give p:

     A (A^TA)^-1 A^T

There is a temptation to say:

     (A^TA)^-1 = A^-1 (A^T^)-1

and then everything in the previous equation cancels. But here's the thing!

     A = [ a  a* ] = [ 2  1 ]
                     [ 2  0 ]
                     [ 1  0 ]

A is not invertible! So we can't find A^-1. And supposing A were invertible, and we could do that multiplication and then get the identity matrix. The projection of b into the column space of A would be equal to b. Multiplying by I would be exactly right!

So, this is the crazy formula that we have to compute:

     p = Ax = A (A^TA)^-1 A^T b

The first part is easy:

A^T A =
                   
                [ 2 1 ]
                [ 2 0 ]
                [ 1 0 ]

     [ 2 2 1 ]  [ 9 2 ]
     [ 1 0 0 ]  [ 2 1 ]

I like to put the second matrix above the result. Then it's easy to remember the rows and columns.

Use the 2 x 2 trick to get the inverse (switch a and d, multiply b and c by -1, and divide everything by the determinant); then continuing with A (A^TA)^-1

              [  1/5  -2/5 ]
              [ -2/5   9/5 ]

     [ 2 1 ]  [   0     1  ] 
     [ 2 0 ]  [  2/5  -4/5 ]
     [ 1 0 ]  [  1/5  -2/5 ]

And finally, multiplying by A^T to get P

                      [  2   2    1  ]
                      [  1   0    0  ]

     [   0     1  ]   [  1   0    0  ]
     [  2/5  -4/5 ]   [  0  4/5  2/5 ]
     [  1/5  -2/5 ]   [  0  2/5  1/5 ]

Confirm this is the same answer we got from numpy (here).

Notice two key properties of P:

P^T = P² = P

                        [  1   0    0  ]
                        [  0  4/5  2/5 ]
                        [  0  2/5  1/5 ]

     [  1   0    0  ]   [  1   0    0  ]
     [  0  4/5  2/5 ]   [  0  4/5  2/5 ]
     [  0  2/5  1/5 ]   [  0  2/5  1/5 ]

Now, multiply P times b (3,4,4) to get p:

                        [ 3 ]
                        [ 4 ]
                        [ 4 ]

     [ 1   0    0  ]  [   3  ]
     [ 0  4/5  2/5 ]  [ 24/5 ]
     [ 0  2/5  1/5 ]  [ 12/5 ]

And:

     e = b - p = (3,4,4) - (3,24/5,12/5)
       = (0,-4/5,8/5)

Confirm:

     e • a  = (0,-4/5,8/5) • (2,2,1) = 0
     e • a* = (0,-4/5,8/5) • (1,0,0) = 0

Another way of handling this does the steps in a different order:

     p = Ax = A (A^TA)^-1 A^T b

We first get (A^TA)^-1 as before:

A^T A =  [ 9 2 ]   
        [ 2 1 ]

(A^TA)^-1 =  1/5 [  1 -2 ]
               [ -2  9 ]

Then we do:

A^T b =
            
                [  3 ]
                [  4 ]
                [  4 ]

     [ 2 2 1 ]  [ 18 ]
     [ 1 0 0 ]  [  3 ]

     x = (A^TA)^-1 A^T b

                    [ 18 ]
      [  3 ]

     1/5 [  1 -2 ]  [ 12/5 ]
         [ -2  9 ]  [ -9/5 ]


     p = Ax 

                [ 12/5 ]
                [ -9/5 ]

     [ 2 1 ]    [   3  ]
     [ 2 0 ]    [ 24/5 ]
     [ 1 0 ]    [ 12/5 ]

     e = b - p = (3,4,4) - (3,24/5,12/5)
       = (0,-4/5,8/5)

It's reassuring to get the same result.

Check out lecture 15 on MIT's ocw (here). It's a work of genius.

Thursday, February 10, 2011

Strang's Linear Algebra: homework (recap)