## Subgroups Generated by Shears

Okay, when I introduced elementary matrices I was a bit vague on the subgroup that the shears generate. I mean to partially rectify that now that we’ve got elementary row operations to work with.

I assert that the upper shears — those elementary matrices with one nonzero entry *above* the diagonal — generate the group of “upper unipotent” matrices. A matrix is unipotent if it is the identity plus a nilpotent transformation, so all of its eigenvalues are . Specifically, we want those that are also upper-triangular. Thus the matrices we’re talking about have everywhere below the diagonal and all on the diagonal, as

What I’m going to do, now, is turn this matrix into the identity by using elementary row operations. Specifically, I’ll only ever add a multiple of one row to another row, which can be effected by multiplying the matrix on the left by shear matrices. And I’ll only add multiples of one row to rows *above* it, which can be effected by using *upper* shears.

So first, let’s add times the th row to the th.

We’ve cleared out the last entry in the next-to-last row in the matrix. Keep going, clearing out all the rest of the last column

Now we can use the next-to-last row — which has only a single nonzero entry left — to clear out the rest of the next-to-last column

Keep going, clearing out the columns from right to left

and we’ve got the identity matrix! So that means that this big product of all these upper shears on the left is actually the *inverse* to the matrix that we started with. Now we just multiply the inverse of each of these shears together in reverse order to find

So any upper-unipotent matrix can be written as a product of upper shears. Similarly, any lower-unipotent matrix (all above the diagonal and all on the diagonal) can be written as a product of lower shears. If we add in scalings, we can adjust the diagonal entries too. Given an invertible upper-triangular matrix, first factor it into the product of a diagonal matrix and an upper-unipotent matrix by dividing each row by its diagonal entry. Then the diagonal part can be built from scalings, while the upper-unipotent part can be built from shears. And, of course, scalings and lower shears together generate the subgroup of invertible lower-triangular matrices.

## Elementary Row and Column Operations

Here’s a topic that might be familiar all the way back to high school mathematics classes. We’re going to use the elementary matrices to manipulate a matrix. Rather than work out abstract formulas, I’m going to take the specific matrix

and perform example manipulations on it to draw conclusions. The effects of elementary matrices are so, well, *elementary* that it will be clear how they generalize.

So how do we use the matrices to manipulate matrices? Well, we’re using the elementary matrices to change the input or output bases of the linear transformation represented by the matrix. So to change the output basis we’ll multiply on the left by an elementary matrix, while to change the input basis we’ll multiply on the right by the inverse of an elementary matrix — which itself is an elementary matrix of the same kind.

Let’s start with the swaps. On the left we might have

while on the right we might have

On the left, the action of a swap is to swap two rows, while on the right the action is to swap two columns of the matrix.

Next come the scalings. On the left

and on the right

On the left, the action of a scaling is to multiply a row by the scaling factor, while on the right the effect is to multiply a column by the scaling factor.

Finally, the shears. On the left

and on the right

On the left, the shear adds times the th row to the th row, while on the right, the shear adds times the th column to the th column.

So in general we see that acting on the left manipulates the rows of a matrix, while acting on the right manipulates the columns. We call these the “elementary row operations” and “elementary column operations”, respectively. Any manipulation of the form of a matrix we can effect by these operations can be seen as the result of applying a change of basis matrix on the left (output) or right (input) side. And so any two matrices related by these operations can be seen as representing “the same” transformation in two different bases.

## Elementary Matrices

Today we’ll write down three different collections of matrices that together provide us all the tools we need to modify bases.

First, and least important, are the swaps. A swap is a matrix that looks like the identity, but has two of its nonzero entries in reversed columns.

where the two swapped columns (or, equivalently, rows) are and . The swaps generate a subgroup of isomorphic to the symmetric group . In fact, these are the image of the usual generators of under the permutation representation. They just rearrange the order of the basis elements.

Next are the scalings. A scaling is a matrix that looks like the identity, but one of its nonzero entries isn’t the identity.

where the entry is in the th row and column. The scalings generate the subgroup of diagonal matrices, which is isomorphic to — independent copies of the group of nonzero elements of under multiplication. They stretch, squeeze, or reverse individual basis elements.

Finally come the shears. A shear is a matrix that looks like the identity, but one of its off-diagonal entries is nonzero.

where the entry is in the th row and th column. If , then the extra nonzero entry falls above the diagonal and we call it an “upper shear”. On the other hand, if then the extra nonzero entry falls below the diagonal, and we call it a “lower shear”. The shears also generate useful subgroups, but the proof of this fact is more complicated, and I’ll save it for its own post.

Now I said that the swaps are the least important of the three elementary transformations, and I should explain myself. It turns out that swaps aren’t really elementary. Indeed, consider the following calculation

So we can build a swap from three shears and a scaling. It should be clear how to generalize this to build *any* swap from three shears and a scaling. But it’s often simpler to just thing of swapping two basis elements as a single basic operation rather than as a composition of shears and scalings.

On the other hand, we can tell that we can’t build any shears from scalings, since the product of scalings is always diagonal. We also can’t build any scalings from shears, since the determinant of any shear is always , and so the product of a bunch of shears also has determinant . Meanwhile, the determinant of a scaling is always the scaling factor .

## Decompositions Past and Future

Okay, let’s review some of the manipulations we’ve established.

When we’re looking at an endomorphism , we have found that we can pick a basis so that the matrix of is in Jordan normal form, which is almost diagonal. That is, we can find an invertible transformation and a “Jordan transformation” so that .

If we work with a normal transformation on a complex inner product space we can pick orthonormal basis of eigenvectors. That is, we can find a unitary transformation and a diagonal transformation so that .

Similarly, if we work with a self-adjoint transformation on a real inner product space we can pick orthonormal basis of eigenvectors. That is, we can find an orthogonal transformation and a diagonal transformation so that .

Then we generalized: if is any linear transformation between two inner product spaces we can find orthonormal bases giving the singular value decomposition. There are two unitary (or orthogonal) transformations and and a “diagonal” transformation so we can write .

We used this to show in particular if is an endomorphism on an inner product space we can write where is unitary and is positive-semidefinite. That is, if we can choose the “output basis” separately from the “input basis”, we can put into a nice form.

Now we want to continue in this direction of choosing input and output bases separately. It’s obvious we have to do this when the source and target spaces are different, but even for endomorphisms we’ll move them separately. But now we’ll move back away from inner product spaces to *any* vector spaces over *any* fields. Just like for the singular value decomposition, what we’ll end up with is essentially captured in the first isomorphism theorem, but we’ll be able to be a lot more explicit about how to find the right bases to simplify the matrix of our transformation.

So here’s the central question for the last chuck of linear algebra (for now): Given a linear transformation between any two vector spaces, how can we pick invertible transformations and so that and is in “as simple a form as possible” (and what does this mean?). To be even more particular, we’ll start with arbitrary bases of and — so already has a matrix — and we’ll look at how to modify the matrix by modifying the two bases.

## The Uniqueness of Polar Decomposition

I asserted in a comment that the polar decomposition of a transformation has certain uniqueness properties. First, we can always uniquely recover the positive-semidefinite part, and then if that’s actually positive-definite we can recover the unitary part. And it turns out our running analogy with complex numbers is the perfect way to get at these facts.

Recall that if we take a complex number in polar form we can recover its length as

Similarly, using the right polar decomposition we can write

where the radical denotes the unique positive-semidefinite square root any positive-semidefinite transformation has.

Now if is nonsingular then so is , since we can write and is definitely nonsingular. So in this case we can write and is uniquely determined. On the other hand, if is singular then we can’t invert . We then have some freedom in our choice of , corresponding to this nontrivial cokernel of .

We could also use the left polar decomposition to write

So the positive-semidefinite part is again uniquely determined, and again the unitary part will be uniquely determined so long as is nonsingular.

## Square Roots

Here’s a neat thing we can do with the spectral theorems: we can take square roots of positive-semidefinite transformations. And this makes sense, since positive-semidefinite transformations are analogous to nonnegative real numbers, and every nonnegative real number has a unique nonnegative real square root. So we expect that every positive-semidefinite transformation will have a unique positive-semidefinite square root.

So we start by writing down the spectral decomposition

Where is diagonal. And since is positive-semidefinite, every diagonal entry of — every eigenvalue of — is a nonnegative real number. We can arrange them nonincreasing order, with the largest eigenvalue in the upper left, and so on down to the lowest eigenvalue (maybe ) in the lower right corner. Since the eigenvalues are uniquely determined, with this arrangement is uniquely determined. If there are repeated eigenvalues, might not be completely determined, since we have some freedom in picking the basis for degenerate eigenspaces.

Anyhow, since each entry in is a nonnegative real number, we can replace each one with its unique nonnegative square root. We call this new matrix , and observe that . Now we can define , and calculate

So is a square root of . Since the eigenvalues of (the diagonal entries of ) are nonnegative real numbers, is positive-semidefinite.

On the other hand, what if we have some other positive-semidefinite square root . Saying that it’s a square root of means that

That is, we must have

The matrix is diagonal, and its entries — the squares of the diagonal entries of — must be the eigenvalues of . And so the entries of are the same as those of , though possibly in a different order. The rearrangement, then, is the content of conjugating by . That is, we must have

and so

And so we really have the exact same square root again. This establishes the uniqueness of the positive-semidefinite square root.

## Polar Decomposition

Okay, let’s take the singular value decomposition and do something really neat with it. Specifically, we’ll start with an endomorphism and we’ll write down its singular value decomposition

where and are in the unitary (orthogonal) group of . So, as it turns out, is also unitary. And is positive-semidefinite (since is). And, of course, since , we can write

That is, any endomorphism can be written as the product of a unitary transformation and a positive-semidefinite one.

Remember that unitary transformations are like unit complex numbers, while positive-semidefinite transformations are like nonnegative real numbers. And so this “polar decomposition” is like the polar form of a complex number, where we write a complex number as the product of a unit complex number and a nonnegative real number.

We can recover the analogy like we did before, by taking determinants. We find

since the determinant of a unitary transformation is a unit complex number, and the determinant of a positive-semidefinite transformation is a nonnegative real number. If is nonsingular, so is actually positive-definite, then will be strictly positive, so the determinant of will be nonzero.

We could also define , so . This is the left polar decomposition (writing the positive-definite part on the left), where the previous form is the right polar decomposition

## The Meaning of the SVD

We spent a lot of time yesterday working out how to write down the singular value decomposition of a transformation , writing

where and are unitary transformations on and , respectively, and is a “diagonal” transformation, in the sense that its matrix looks like

where really *is* a nonsingular diagonal matrix.

So what’s it good for?

Well, it’s a very concrete representation of the first isomorphism theorem. Every transformation is decomposed into a projection, an isomorphism, and an inclusion. But here the projection and the inclusion are built up into unitary transformations (as we verified is always possible), and the isomorphism is the part of .

Incidentally, this means that we can read off the rank of from the number of rows in , while the nullity is the number of zero columns in .

More heuristically, this is saying we can break any transformation into three parts. First, picks out an orthonormal basis of “canonical input vectors”. Then handles the actual transformation, scaling the components in these directions, or killing them off entirely (for the zero columns). Finally, takes us out of the orthonormal basis of “canonical output vectors”. It tells us that if we’re allowed to pick the input and output bases separately, we kill off one subspace (the kernel) and can diagonalize the action on the remaining subspace.

The SVD also comes in handy for solving systems of linear equations. Let’s say we have a system written down as the matrix equation

where is the matrix of the system. If is the zero vector we have a homogeneous system, and otherwise we have an inhomogeneous system. So let’s use the singular value decomposition for :

and then we find

So we can check ourselves by calculating . If this extends into the zero rows of there’s no possible way to satisfy the equation. That is, we can quickly see if the system is unsolvable. On the other hand, if lies completely within the nonzero rows of , it’s straightforward to solve this equation. We first write down the new transformation

where it’s not quite apparent from this block form, but we’ve also taken a transpose. That is, there are as many columns in as there are rows in , and vice versa. The upshot is that is a transformation which kills off the same kernel as does, but is otherwise the identity. Thus we can proceed

This “undoes” the scaling from . We can also replace the lower rows of with variables, since applying will kill them off anyway. Finally, we find

and, actually, a whole family of solutions for the variables we could introduce in the previous step. But this will at least give *one* solution, and then all the others differ from this one by a vector in , as usual.

## The Singular Value Decomposition

Now the real and complex spectral theorems give nice decompositions of self-adjoint and normal transformations, respectively. Each one is of a similar form

where is orthogonal, is unitary, and (in either case) is diagonal. What we want is a similar decomposition for *any* transformation. And, in fact, we’ll get one that even works for transformations between different inner prouct spaces.

So let’s say we’ve got a transformation (we’re going to want to save and to denote transformations). We also have its adjoint . Then is positive-semidefinite (and thus self-adjoint and normal), and so the spectral theorem applies. There must be a unitary transformation (orthogonal, if we’re working with real vector spaces) so that

where is a diagonal matrix with strictly positive entries.

That is, we can break up as the direct sum . The diagonal transformation is positive-*definite*, while the restriction of to is the zero transformation. We will restrict to each of these subspaces, giving and , along with their adjoints and . Then we can write

From this we conclude both that and that . We define , where we get the last matrix by just taking the inverse of the square root of each of the diagonal entries of (this is part of why diagonal transformations are so nice to work with). Then we can calculate

This is good, but we don’t yet have unitary matrices in our decomposition. We do know that , and we can check that

Now we know that we can use to “fill out” to give the unitary transformation . That is, (as we just noted), (similarly), and are both the appropriate zero transformation, and . Notice that these are exactly stating that the adjoints and are the projection operators corresponding to the inclusions and in a direct sum representation of as . It’s clear from general principles that there must be *some* projections, but it’s the unitarity of that makes the projections be exactly the adjoints of the inclusions.

What we need to do now is to similarly fill out by supplying a corresponding that will similarly “fill out” a unitary transformation . But we know that we can do this! Pick an orthonormal basis of and hit it with to get a bunch of orthonormal vectors in (orthonormal because . Then fill these out to an orthonormal basis of all of . Just set to be the span of all the new basis vectors, which is the orthogonal complement of the image of , and let be the inclusion of into . We can then combine to get a unitary transformation

Finally, we define

where there are as many zero rows in as we needed to add to fill out the basis of (the dimension of ). I say that is our desired decomposition. Indeed, we can calculate

where and are unitary on and , respectively, and is a “diagonal” transformation (not strictly speaking in the case where and have different dimensions).

## The Real Spectral Theorem

Let’s take the last couple lemmas we’ve proven and throw them together to prove the real analogue of the complex spectral theorem. We start with a self-adjoint transformation on a finite-dimensional real inner-product space .

First off, since is self-adjoint, we know that it has an eigenvector , which we can pick to have unit length (how?). The subspace is then invariant under the action of . But then the orthogonal complement is also invariant under . So we can restrict it to a transformation .

It’s not too hard to see that is *also* self-adjoint, and so *it* must have an eigenvector , which will also be an eigenvector of . And we’ll get an orthogonal complement , and so on. Since every step we take reduces the dimension of the vector space we’re looking at by one, we must eventually bottom out. At that point, we have an orthonormal basis of eigenvectors for our original space . Each eigenvector was picked to have unit length, and each one is in the orthogonal complement of those that came before, so they’re all orthogonal to each other.

Just like in the complex case, if we have a basis and a matrix already floating around for , we can use this new basis to perform a change of basis, which will be orthogonal (not unitary in this case). That is, we can write the matrix of any self-adjoint transformation as , where is an orthogonal matrix and is diagonal. Alternately, since , we can think of this as , in case we’re considering our transformation as representing a bilinear form (which self-adjoint transformations often are).

What if we’ve got this sort of representation? A transformation with a matrix of the form must be self-adjoint. Indeed, we can take its adjoint to find

but since is diagonal, it’s automatically symmetric, and thus represents a self-adjoint transformation. Thus if a real transformation has an orthonormal basis of eigenvectors, it must be self-adjoint.

Notice that this is a somewhat simpler characterization than in the complex case. This hinges on the fact that for real transformations taking the adjoint corresponds to simple matrix transposition, and every diagonal matrix is automatically symmetric. For complex transformations, taking the adjoint corresponds to *conjugate* transposition, and not all diagonal matrices are Hermitian. That’s why we had to expand to the broader class of normal transformations.