The Unapologetic Mathematician

Mathematics for the interested outsider

Subgroups Generated by Shears

Okay, when I introduced elementary matrices I was a bit vague on the subgroup that the shears generate. I mean to partially rectify that now that we’ve got elementary row operations to work with.

I assert that the upper shears — those elementary matrices with one nonzero entry above the diagonal — generate the group of “upper unipotent” matrices. A matrix is unipotent if it is the identity plus a nilpotent transformation, so all of its eigenvalues are {1}. Specifically, we want those that are also upper-triangular. Thus the matrices we’re talking about have {0} everywhere below the diagonal and all {1} on the diagonal, as

\displaystyle A=\begin{pmatrix}1&a_{1,2}&\cdots&&a_{1,n}\\{0}&1&&&\\\vdots&&\ddots&&\vdots\\&&&1&a_{n-1,n}\\{0}&&\cdots&0&1\end{pmatrix}

What I’m going to do, now, is turn this matrix into the identity by using elementary row operations. Specifically, I’ll only ever add a multiple of one row to another row, which can be effected by multiplying the matrix on the left by shear matrices. And I’ll only add multiples of one row to rows above it, which can be effected by using upper shears.

So first, let’s add -a_{n-1,n} times the nth row to the n-1th.


We’ve cleared out the last entry in the next-to-last row in the matrix. Keep going, clearing out all the rest of the last column

\displaystyle\left(H_{1,n,-a_{1,n}}\dots H_{n-1,n,-a_{n-1,n}}\right)A=\begin{pmatrix}1&a_{1,2}&&\cdots&&0\\{0}&1&&&&\\&&\ddots&&&\vdots\\\vdots&&&1&a_{n-2,n-1}&0\\&&&&1&0\\{0}&&&\cdots&0&1\end{pmatrix}

Now we can use the next-to-last row — which has only a single nonzero entry left — to clear out the rest of the next-to-last column

\displaystyle\begin{aligned}\left(\left(H_{1,n-1,-a_{1,n-1}}\dots H_{n-2,n-1,-a_{n-2,n-1}}\right)\left(H_{1,n,-a_{1,n}}\dots H_{n-1,n,-a_{n-1,n}}\right)\right)A&\\=\begin{pmatrix}1&a_{1,2}&&\cdots&&0\\{0}&1&&&&\\&&\ddots&&&\vdots\\\vdots&&&1&0&0\\&&&&1&0\\{0}&&&\cdots&0&1\end{pmatrix}&\end{aligned}

Keep going, clearing out the columns from right to left

\displaystyle\left(\left(H_{1,2,-a_{1,2}}\right)\dots\left(H_{1,n,-a_{1,n}}\dots H_{n-1,n,-a_{n-1,n}}\right)\right)A=\begin{pmatrix}1&0&&\cdots&&0\\{0}&1&&&&\\&&\ddots&&&\vdots\\\vdots&&&1&0&0\\&&&&1&0\\{0}&&&\cdots&0&1\end{pmatrix}

and we’ve got the identity matrix! So that means that this big product of all these upper shears on the left is actually the inverse to the matrix A that we started with. Now we just multiply the inverse of each of these shears together in reverse order to find

\displaystyle A=\left(\left(H_{n-1,n,a_{n-1,n}}\dots H_{1,n,a_{1,n}}\right)\dots\left(H_{1,2,a_{1,2}}\right)\right)

So any upper-unipotent matrix can be written as a product of upper shears. Similarly, any lower-unipotent matrix (all {0} above the diagonal and all {1} on the diagonal) can be written as a product of lower shears. If we add in scalings, we can adjust the diagonal entries too. Given an invertible upper-triangular matrix, first factor it into the product of a diagonal matrix and an upper-unipotent matrix by dividing each row by its diagonal entry. Then the diagonal part can be built from scalings, while the upper-unipotent part can be built from shears. And, of course, scalings and lower shears together generate the subgroup of invertible lower-triangular matrices.


August 28, 2009 Posted by | Algebra, Linear Algebra | 1 Comment

Elementary Row and Column Operations

Here’s a topic that might be familiar all the way back to high school mathematics classes. We’re going to use the elementary matrices to manipulate a matrix. Rather than work out abstract formulas, I’m going to take the specific matrix


and perform example manipulations on it to draw conclusions. The effects of elementary matrices are so, well, elementary that it will be clear how they generalize.

So how do we use the matrices to manipulate matrices? Well, we’re using the elementary matrices to change the input or output bases of the linear transformation represented by the matrix. So to change the output basis we’ll multiply on the left by an elementary matrix, while to change the input basis we’ll multiply on the right by the inverse of an elementary matrix — which itself is an elementary matrix of the same kind.

Let’s start with the swaps. On the left we might have


while on the right we might have


On the left, the action of a swap is to swap two rows, while on the right the action is to swap two columns of the matrix.

Next come the scalings. On the left


and on the right


On the left, the action of a scaling is to multiply a row by the scaling factor, while on the right the effect is to multiply a column by the scaling factor.

Finally, the shears. On the left


and on the right


On the left, the shear H_{i,j,c} adds c times the jth row to the ith row, while on the right, the shear adds c times the ith column to the jth column.

So in general we see that acting on the left manipulates the rows of a matrix, while acting on the right manipulates the columns. We call these the “elementary row operations” and “elementary column operations”, respectively. Any manipulation of the form of a matrix we can effect by these operations can be seen as the result of applying a change of basis matrix on the left (output) or right (input) side. And so any two matrices related by these operations can be seen as representing “the same” transformation in two different bases.

August 27, 2009 Posted by | Algebra, Linear Algebra | 8 Comments

Elementary Matrices

Today we’ll write down three different collections of matrices that together provide us all the tools we need to modify bases.

First, and least important, are the swaps. A swap is a matrix that looks like the identity, but has two of its nonzero entries in reversed columns.

\displaystyle W_{i,j}\begin{pmatrix}1&&&&&&0\\&\ddots&&&&&\\&&0&\cdots&1&&\\&&\vdots&\ddots&\vdots&&\\&&1&\cdots&0&&\\&&&&&\ddots&\\{0}&&&&&&1\end{pmatrix}

where the two swapped columns (or, equivalently, rows) are i and j. The swaps generate a subgroup of \mathrm{GL}(n,\mathbb{F}) isomorphic to the symmetric group S_n. In fact, these are the image of the usual generators of S_n under the permutation representation. They just rearrange the order of the basis elements.

Next are the scalings. A scaling is a matrix that looks like the identity, but one of its nonzero entries isn’t the identity.

\displaystyle C_{i,c}=\begin{pmatrix}1&&&&&&0\\&\ddots&&&&&\\&&1&&&&\\&&&c&&&\\&&&&1&&\\&&&&&\ddots&\\{0}&&&&&&1\end{pmatrix}

where the entry c is in the ith row and column. The scalings generate the subgroup of diagonal matrices, which is isomorphic to \left(\mathbb{F}^\times\right)^nn independent copies of the group of nonzero elements of \mathbb{F} under multiplication. They stretch, squeeze, or reverse individual basis elements.

Finally come the shears. A shear is a matrix that looks like the identity, but one of its off-diagonal entries is nonzero.

\displaystyle H_{i,j,c}=\begin{pmatrix}1&&&&&&0\\&\ddots&&&&&\\&&1&&c&&\\&&&\ddots&&&\\&&&&1&&\\&&&&&\ddots&\\{0}&&&&&&1\end{pmatrix}

where the entry c is in the ith row and jth column. If i<j, then the extra nonzero entry falls above the diagonal and we call it an “upper shear”. On the other hand, if i>j then the extra nonzero entry falls below the diagonal, and we call it a “lower shear”. The shears also generate useful subgroups, but the proof of this fact is more complicated, and I’ll save it for its own post.

Now I said that the swaps are the least important of the three elementary transformations, and I should explain myself. It turns out that swaps aren’t really elementary. Indeed, consider the following calculation


So we can build a swap from three shears and a scaling. It should be clear how to generalize this to build any swap from three shears and a scaling. But it’s often simpler to just thing of swapping two basis elements as a single basic operation rather than as a composition of shears and scalings.

On the other hand, we can tell that we can’t build any shears from scalings, since the product of scalings is always diagonal. We also can’t build any scalings from shears, since the determinant of any shear is always {1}, and so the product of a bunch of shears also has determinant {1}. Meanwhile, the determinant of a scaling C_{i,c} is always the scaling factor c\neq1.

August 26, 2009 Posted by | Algebra, Linear Algebra | 5 Comments

Decompositions Past and Future

Okay, let’s review some of the manipulations we’ve established.

When we’re looking at an endomorphism T:V\rightarrow V, we have found that we can pick a basis so that the matrix of T is in Jordan normal form, which is almost diagonal. That is, we can find an invertible transformation S and a “Jordan transformation” J so that T=SJS^{-1}.

If we work with a normal transformation N on a complex inner product space we can pick orthonormal basis of eigenvectors. That is, we can find a unitary transformation U and a diagonal transformation \Sigma so that N=U\Sigma U^{-1}.

Similarly, if we work with a self-adjoint transformation S on a real inner product space we can pick orthonormal basis of eigenvectors. That is, we can find an orthogonal transformation O and a diagonal transformation \Sigma so that S=O\Sigma O^{-1}.

Then we generalized: if M:A\rightarrow B is any linear transformation between two inner product spaces we can find orthonormal bases giving the singular value decomposition. There are two unitary (or orthogonal) transformations U:B\rightarrow B and V:A\rightarrow A and a “diagonal” transformation \Sigma:A\rightarrow B so we can write M=U\Sigma V^{-1}.

We used this to show in particular if T is an endomorphism on an inner product space we can write T=UP where U is unitary and P is positive-semidefinite. That is, if we can choose the “output basis” separately from the “input basis”, we can put T into a nice form.

Now we want to continue in this direction of choosing input and output bases separately. It’s obvious we have to do this when the source and target spaces are different, but even for endomorphisms we’ll move them separately. But now we’ll move back away from inner product spaces to any vector spaces over any fields. Just like for the singular value decomposition, what we’ll end up with is essentially captured in the first isomorphism theorem, but we’ll be able to be a lot more explicit about how to find the right bases to simplify the matrix of our transformation.

So here’s the central question for the last chuck of linear algebra (for now): Given a linear transformation T:A\rightarrow B between any two vector spaces, how can we pick invertible transformations X\in\mathrm{GL}(A) and Y\in\mathrm{GL}(B) so that T=Y\Sigma X and \Sigma is in “as simple a form as possible” (and what does this mean?). To be even more particular, we’ll start with arbitrary bases of A and B — so T already has a matrix — and we’ll look at how to modify the matrix by modifying the two bases.

August 24, 2009 Posted by | Algebra, Linear Algebra | 5 Comments

The Uniqueness of Polar Decomposition

I asserted in a comment that the polar decomposition of a transformation has certain uniqueness properties. First, we can always uniquely recover the positive-semidefinite part, and then if that’s actually positive-definite we can recover the unitary part. And it turns out our running analogy with complex numbers is the perfect way to get at these facts.

Recall that if we take a complex number in polar form z=re^{i\theta} we can recover its length as

\displaystyle\lvert z\rvert=\sqrt{\bar{z}z}=\sqrt{re^{-i\theta}re^{i\theta}}=\sqrt{r^2}=r

Similarly, using the right polar decomposition we can write


where the radical denotes the unique positive-semidefinite square root any positive-semidefinite transformation has.

Now if T is nonsingular then so is P, since we can write P=U^*T and U^* is definitely nonsingular. So in this case we can write U=TP^{-1} and U is uniquely determined. On the other hand, if T is singular then we can’t invert P. We then have some freedom in our choice of U, corresponding to this nontrivial cokernel of P.

We could also use the left polar decomposition to write


So the positive-semidefinite part is again uniquely determined, and again the unitary part will be uniquely determined so long as T is nonsingular.

August 21, 2009 Posted by | Algebra, Linear Algebra | Leave a comment

Square Roots

Here’s a neat thing we can do with the spectral theorems: we can take square roots of positive-semidefinite transformations. And this makes sense, since positive-semidefinite transformations are analogous to nonnegative real numbers, and every nonnegative real number has a unique nonnegative real square root. So we expect that every positive-semidefinite transformation will have a unique positive-semidefinite square root.

So we start by writing down the spectral decomposition

\displaystyle P=U\Lambda U^*

Where \Lambda is diagonal. And since P is positive-semidefinite, every diagonal entry of \Lambda — every eigenvalue of P — is a nonnegative real number. We can arrange them nonincreasing order, with the largest eigenvalue in the upper left, and so on down to the lowest eigenvalue (maybe {0}) in the lower right corner. Since the eigenvalues are uniquely determined, with this arrangement \Lambda is uniquely determined. If there are repeated eigenvalues, U might not be completely determined, since we have some freedom in picking the basis for degenerate eigenspaces.

Anyhow, since each entry in \Lambda is a nonnegative real number, we can replace each one with its unique nonnegative square root. We call this new matrix \Sigma, and observe that \Sigma^2=\Lambda. Now we can define S=U\Sigma U^*, and calculate

\displaystyle S^2=U\Sigma U^*U\Sigma U^*=U\Sigma^2U^*=U\Lambda U^*=P

So S is a square root of P. Since the eigenvalues of S (the diagonal entries of \Sigma) are nonnegative real numbers, S is positive-semidefinite.

On the other hand, what if we have some other positive-semidefinite square root S'=U'\Sigma'U'^*. Saying that it’s a square root of P means that

\displaystyle S'^2=U'\Sigma'U'^*U'\Sigma'U'^*=U'\Sigma'^2U'^*=U\Lambda U^*=P

That is, we must have


The matrix \Sigma'^2 is diagonal, and its entries — the squares of the diagonal entries of \Sigma — must be the eigenvalues of \Lambda. And so the entries of \Sigma' are the same as those of \Sigma, though possibly in a different order. The rearrangement, then, is the content of conjugating by U^*U'. That is, we must have


and so

\displaystyle U\Sigma U^*=U'\Sigma'U'^*

And so we really have the exact same square root again. This establishes the uniqueness of the positive-semidefinite square root.

August 20, 2009 Posted by | Algebra, Linear Algebra | 2 Comments

Polar Decomposition

Okay, let’s take the singular value decomposition and do something really neat with it. Specifically, we’ll start with an endomorphism T:A\rightarrow A and we’ll write down its singular value decomposition

\displaystyle T=V\Sigma W^*

where V and W are in the unitary (orthogonal) group of A. So, as it turns out, U=VW^* is also unitary. And P=W\Sigma W^* is positive-semidefinite (since \Sigma is). And, of course, since W^*W=1_A, we can write

\displaystyle T=V\Sigma W^*=VW^*W\Sigma W^*=UP

That is, any endomorphism can be written as the product of a unitary transformation and a positive-semidefinite one.

Remember that unitary transformations are like unit complex numbers, while positive-semidefinite transformations are like nonnegative real numbers. And so this “polar decomposition” is like the polar form of a complex number, where we write a complex number as the product of a unit complex number and a nonnegative real number.

We can recover the analogy like we did before, by taking determinants. We find


since the determinant of a unitary transformation is a unit complex number, and the determinant of a positive-semidefinite transformation is a nonnegative real number. If T is nonsingular, so P is actually positive-definite, then \det(P) will be strictly positive, so the determinant of T will be nonzero.

We could also define P'=W\Sigma W^*, so T=P'U. This is the left polar decomposition (writing the positive-definite part on the left), where the previous form is the right polar decomposition

August 19, 2009 Posted by | Algebra, Linear Algebra | 7 Comments

The Meaning of the SVD

We spent a lot of time yesterday working out how to write down the singular value decomposition of a transformation M:A\rightarrow B, writing

\displaystyle M=U\Sigma V^*

where U and V are unitary transformations on B and A, respectively, and \Sigma:A\rightarrow B is a “diagonal” transformation, in the sense that its matrix looks like


where D really is a nonsingular diagonal matrix.

So what’s it good for?

Well, it’s a very concrete representation of the first isomorphism theorem. Every transformation is decomposed into a projection, an isomorphism, and an inclusion. But here the projection and the inclusion are built up into unitary transformations (as we verified is always possible), and the isomorphism is the D part of \Sigma.

Incidentally, this means that we can read off the rank of M from the number of rows in D, while the nullity is the number of zero columns in \Sigma.

More heuristically, this is saying we can break any transformation into three parts. First, V^* picks out an orthonormal basis of “canonical input vectors”. Then \Sigma handles the actual transformation, scaling the components in these directions, or killing them off entirely (for the zero columns). Finally, U takes us out of the orthonormal basis of “canonical output vectors”. It tells us that if we’re allowed to pick the input and output bases separately, we kill off one subspace (the kernel) and can diagonalize the action on the remaining subspace.

The SVD also comes in handy for solving systems of linear equations. Let’s say we have a system written down as the matrix equation

\displaystyle Mx+b=0

where M is the matrix of the system. If b\in B is the zero vector we have a homogeneous system, and otherwise we have an inhomogeneous system. So let’s use the singular value decomposition for M:

\displaystyle U\Sigma V^*x+b=0

and then we find

\displaystyle\Sigma V^*x=-U^*b

So we can check ourselves by calculating -U^*b. If this extends into the zero rows of \Sigma there’s no possible way to satisfy the equation. That is, we can quickly see if the system is unsolvable. On the other hand, if -U^*b lies completely within the nonzero rows of \Sigma, it’s straightforward to solve this equation. We first write down the new transformation


where it’s not quite apparent from this block form, but we’ve also taken a transpose. That is, there are as many columns in \Sigma^+ as there are rows in \Sigma, and vice versa. The upshot is that \Sigma^+\Sigma:A\rightarrow A is a transformation which kills off the same kernel as \Sigma does, but is otherwise the identity. Thus we can proceed

\displaystyle\Sigma^+\Sigma V^*x=-\Sigma^+U^*b

This \Sigma^+ “undoes” the scaling from \Sigma. We can also replace the lower rows of -\Sigma^+U^*b with variables, since applying \Sigma will kill them off anyway. Finally, we find

\displaystyle x=-V\Sigma^+U^*b

and, actually, a whole family of solutions for the variables we could introduce in the previous step. But this will at least give one solution, and then all the others differ from this one by a vector in \mathrm{Ker}(M), as usual.

August 18, 2009 Posted by | Algebra, Linear Algebra | Leave a comment

The Singular Value Decomposition

Now the real and complex spectral theorems give nice decompositions of self-adjoint and normal transformations, respectively. Each one is of a similar form

\displaystyle\begin{aligned}S&=O\Lambda O^*\\H&=U\Lambda U^*\end{aligned}

where O is orthogonal, U is unitary, and \Lambda (in either case) is diagonal. What we want is a similar decomposition for any transformation. And, in fact, we’ll get one that even works for transformations between different inner prouct spaces.

So let’s say we’ve got a transformation M:A\rightarrow B (we’re going to want to save U and V to denote transformations). We also have its adjoint M^*:B\rightarrow A. Then M^*M:A\rightarrow A is positive-semidefinite (and thus self-adjoint and normal), and so the spectral theorem applies. There must be a unitary transformation V:A\rightarrow A (orthogonal, if we’re working with real vector spaces) so that

\displaystyle V^*M^*MV=\begin{pmatrix}D&0\\{0}&0\end{pmatrix}

where D is a diagonal matrix with strictly positive entries.

That is, we can break A up as the direct sum A=A_1\oplus A_2. The diagonal transformation D:A_1\rightarrow A_1 is positive-definite, while the restriction of V^*M^*MV to A_2 is the zero transformation. We will restrict V to each of these subspaces, giving V_1:A_1\rightarrow A and V_2:A_2\rightarrow A, along with their adjoints V_1^*:A\rightarrow A_1 and V_2^*:A\rightarrow A_2. Then we can write


From this we conclude both that V_1^*M^*MV_1=D and that MV_2=0. We define U_1=MV_1D^{-\frac{1}{2}}:A_1\rightarrow B, where we get the last matrix by just taking the inverse of the square root of each of the diagonal entries of D (this is part of why diagonal transformations are so nice to work with). Then we can calculate


This is good, but we don’t yet have unitary matrices in our decomposition. We do know that V_1^*V_1=1_{A_1}, and we can check that


Now we know that we can use V_2:A_2\rightarrow A to “fill out” V_1 to give the unitary transformation V. That is, V_1^*V_1=1_{A_1} (as we just noted), V_2^*V_2=1_{A_2} (similarly), V_1^*V_2 and V_2^*V_1 are both the appropriate zero transformation, and V_1V_1^*+V_2V_2^*=1_A. Notice that these are exactly stating that the adjoints V_1^* and V_2^* are the projection operators corresponding to the inclusions V_1 and V_2 in a direct sum representation of A as A_1\oplus A_2. It’s clear from general principles that there must be some projections, but it’s the unitarity of V that makes the projections be exactly the adjoints of the inclusions.

What we need to do now is to similarly fill out U_1 by supplying a corresponding U_2:B_2\rightarrow B that will similarly “fill out” a unitary transformation U. But we know that we can do this! Pick an orthonormal basis of A_1 and hit it with U_1 to get a bunch of orthonormal vectors in B (orthonormal because U_1^*U_1=1_{A_1}. Then fill these out to an orthonormal basis of all of B. Just set B_2 to be the span of all the new basis vectors, which is the orthogonal complement of the image of U_1, and let U_2 be the inclusion of B_2 into B. We can then combine to get a unitary transformation

\displaystyle U=\begin{pmatrix}U_1&U_2\end{pmatrix}

Finally, we define


where there are as many zero rows in \Sigma as we needed to add to fill out the basis of B (the dimension of B_2). I say that U\Sigma V^* is our desired decomposition. Indeed, we can calculate

\displaystyle\begin{aligned}U\Sigma V^*&=\begin{pmatrix}U_1&U_2\end{pmatrix}\begin{pmatrix}D^\frac{1}{2}&0\\{0}&0\end{pmatrix}\begin{pmatrix}V_1^*\\V_2^*\end{pmatrix}\\&=\begin{pmatrix}U_1D^\frac{1}{2}&0\end{pmatrix}\begin{pmatrix}V_1^*\\V_2^*\end{pmatrix}\\&=U_1D^\frac{1}{2}V_1^*=M\end{aligned}

where U and V are unitary on B and A, respectively, and \Sigma is a “diagonal” transformation (not strictly speaking in the case where A and B have different dimensions).

August 17, 2009 Posted by | Algebra, Linear Algebra | 3 Comments

The Real Spectral Theorem

Let’s take the last couple lemmas we’ve proven and throw them together to prove the real analogue of the complex spectral theorem. We start with a self-adjoint transformation S:V\rightarrow V on a finite-dimensional real inner-product space V.

First off, since S is self-adjoint, we know that it has an eigenvector e_1, which we can pick to have unit length (how?). The subspace \mathbb{R}e_1 is then invariant under the action of S. But then the orthogonal complement V_1=\left(\mathbb{R}e_1\right)^\perp is also invariant under S. So we can restrict it to a transformation S_1:V_1\rightarrow V_1.

It’s not too hard to see that S_1 is also self-adjoint, and so it must have an eigenvector e_2, which will also be an eigenvector of S. And we’ll get an orthogonal complement V_2, and so on. Since every step we take reduces the dimension of the vector space we’re looking at by one, we must eventually bottom out. At that point, we have an orthonormal basis of eigenvectors for our original space V. Each eigenvector was picked to have unit length, and each one is in the orthogonal complement of those that came before, so they’re all orthogonal to each other.

Just like in the complex case, if we have a basis and a matrix already floating around for S, we can use this new basis to perform a change of basis, which will be orthogonal (not unitary in this case). That is, we can write the matrix of any self-adjoint transformation S as O\Lambda O^{-1}, where O is an orthogonal matrix and \Lambda is diagonal. Alternately, since O^{-1}=O^*, we can think of this as O\Lambda O^*, in case we’re considering our transformation as representing a bilinear form (which self-adjoint transformations often are).

What if we’ve got this sort of representation? A transformation with a matrix of the form O\Lambda O^* must be self-adjoint. Indeed, we can take its adjoint to find

\displaystyle\left(O\Lambda O^*\right)^*=\left(O^*\right)^*\Lambda^*O^*=O\Lambda^*O^*

but since \Lambda is diagonal, it’s automatically symmetric, and thus represents a self-adjoint transformation. Thus if a real transformation has an orthonormal basis of eigenvectors, it must be self-adjoint.

Notice that this is a somewhat simpler characterization than in the complex case. This hinges on the fact that for real transformations taking the adjoint corresponds to simple matrix transposition, and every diagonal matrix is automatically symmetric. For complex transformations, taking the adjoint corresponds to conjugate transposition, and not all diagonal matrices are Hermitian. That’s why we had to expand to the broader class of normal transformations.

August 14, 2009 Posted by | Algebra, Linear Algebra | 10 Comments