## So that’s why they stuck me here

Taken from the parking lot of the hotel:

At least it won’t be hard to find it tomorrow morning.

## How Many Generators?

Okay, one last post to fill out the week.

The shears alone generate the special linear group. Can we strip them down any further? And, with this in mind, how many generators does it take to build up the whole general linear group?

It turns out that we don’t even need all the shears. We can just use neighboring shears to build all the others. Indeed:

In terms of elementary row operations, first we add the third row to the second. Then we add the second to the first, effectively adding the third row to the first as well. Then we subtract the third row from the second, undoing that first step. Finally, we subtract the second row (alone now) from the first, undoing the extra addition of the second row to the first. At the end of the whole process we’ve added the third row to the first. We could modify this by adding a multiple of the third row to the second, and subtracting the same multiple later. Check to see what result that has. And we have similar results using neighboring lower shears

So we can generate the special linear group using only the neighboring shears. If we have an matrix in we can take its determinant . Then we can write . Here we’ve factored out a scaling by the determinant in the first row and we’re left with a matrix in , which can then be written in terms of neighboring shears. So we need (families of) generators here.

These are the best I can do, and I don’t see a way of improving. Roughly, upper shears can’t build up lower shears, any collection of neighboring shears can only affect the rows they cover in sequence and so can’t build up a new neighboring shear, and no shears can handle that one scaling. So it seems there’s no way to pare down *this* collection of generators. But there might be a completely different approach that leads to fewer families of generators. If someone has one, I’d be glad to see it.

## What’s Next?

Last May we started talking about linear algebra, with a little aside into complex numbers and another into power series along the way. Before that all, long long ago, we were talking about single variable calculus. Specifically, we were studying functions which took a real number in and gave a real number back out, and the two main aspects to this study: differentiation and integration.

The first part studied how a function changed as its input varied near a fixed point by coming up with the best linear approximation of the function near that point. Now that we’ve got an understanding of linear functions between higher-dimensional real vector spaces, we can work towards extending this idea of differential calculus into multivariable functions.

The second part studied how to “add up” a continuously-varying collection of values, each with its own (infinitesimal) weight. Again, our new understanding of higher-dimensional analogues of linear spaces and functions will help us find the right way to generalize the integral calculus.

I’m trying to get access to some of my references again, since I no longer have even as much of a mathematical library down the hall as Western Kentucky University provided. So I’ll pick up when I can.

## Shears Generate the Special Linear Group

We established that if we restrict to upper shears we can generate all upper-unipotent matrices. On the other hand if we use all shears and scalings we can generate any invertible matrix we want (since swaps can be built from shears and scalings). We clearly can’t build any matrix whatsoever from shears alone, since every shear has determinant and so must any product of shears. But it turns out that we *can* use shears to generate *any* matrix of determinant — those in the special linear group.

First of all, let’s consider the following matrix equations, which should be easy to verify

These show that we can always pull a scaling to the left past a shear. In the first two cases, the scaling and the shear commute if the row and column the scaling acts on are uninvolved in the shear. In the last two cases, we have to modify the shear in the process, but we end up with the scaling written to the left of a shear instead of to the right. We can use these toy examples to see that we can always pull a scaling from the right to the left of a shear, possibly changing the shear in the process.

What does this mean? When we take a matrix and write it out in terms of elementary matrices, we can always modify this expression so that all the scalings are to the left of all the shears. Then we have a diagonal matrix to the left of a long product of shears, since the product of a bunch of scalings is a diagonal matrix. But now the determinant of each shear is , and the determinant of the diagonal matrix must be the product of the diagonal entries, which are the scaling factors. And so the product of the scaling factors is the determinant of our original matrix.

We’re specifically concerned with matrices of determinant , meaning the product of all the diagonal entries must come out to be . I’m going to use this fact to write the diagonal matrix as a product of scalings in a very particular way. Let’s say the diagonal entry in row is . Then I’m going to start by writing down

I’ve scaled the first row by the right amount, and then scaled the second row by the inverse amount so the product of the two scaling factors is . Then I write down

The product of the two scalings of the second row ends up scaling it by , and we scale the third row to compensate. We continue this way, scaling each row to the right amount, and the next one by the inverse factor. Once we scale the next-to-last row we’re done, since the scaling factor for the last row must be exactly what we need to make the total product of all the scaling factors come out to . That is, as long as the total scaling factor is , we can write the diagonal matrix as the product of these pairs of scalings with inverse scaling factors.

Now let’s take four shears, alternating upper and lower, since two upper shears in a row are the same as a single upper shear, and similarly for lower shears. We want it to come out to one of these pairs of scalings.

This gives us four equations to solve

These quickly simplify to

Which can be solved to find

So we could pick and for any scaling factor write

And so we can write such a pair of scalings with inverse scaling factors as a product of four shears. Since in the case at hand we can write the diagonal part of our elementary matrix decomposition with such pairs of scalings, we can translate them all into shears. And at the end of the day, we can write any special linear transformation as a product of a bunch of shears.

## The Special Linear Group (and others)

We’ve got down the notion of the general linear group of a vector space , including the particular case of the matrix group of the space . We also have defined the orthogonal group of matrices over whose transpose and inverse are the same, which is related to the orthogonal group of orthogonal transformations of the real vector space preserving a specified bilinear form . Lastly, we’ve defined the group of unitary transformations on — complex matrices whose conjugate transpose and inverse are the same.

For all of these matrix groups — which are all subgroups of some appropriate — we have a homomorphism to the multiplicative group of given by the determinant. We originally defined the determinant on itself, but we can easily restrict it to any subgroup. We actually know that for unitary and orthogonal transformations the image of this homomorphism must lie in a particular subgroup of . But in any case, the homomorphism must have a kernel, and this kernel turns out to be important.

In the case of the general linear group , the kernel of the determinant homomorphism consists of the automorphisms of with determinant . We call this subgroup of the “special linear group” , and transformations in this subgroup are sometimes called “special linear transformations”. Of course, we also have the particular special linear group . When we take the kernel of any of the other groups, we prepend the adjective “special” and an to the notation. Thus we have the special orthogonal groups and and the special unitary group .

In a sense, all the interesting part of the general linear group is contained in the special linear subgroup. Outside of that, what remains is “just” a scaling. It’s a little more complicated than it seems on the surface, but not much.

## Elementary Matrices Generate the General Linear Group

Okay, so we can use elementary row operations to put any matrix into its (unique) reduced row echelon form. As we stated last time, this consists of building up a basis for the image of the transformation the matrix describes by walking through a basis for the domain space and either adding a new, independent basis vector or writing the image of a domain basis vector in terms of the existing image basis vectors.

So let’s say we’ve got a transformation in . Given a basis, we get an invertible matrix (which we’ll also call ). Then we can use elementary row operations to put this matrix into its reduced row echelon form. But now every basis vector gets sent to a vector that’s linearly independent of all the others, or else the transformation wouldn’t be invertible! That is, the reduced row echelon form of the matrix must be the identity matrix.

But remember that every one of our elementary row operations is the result of multiplying on the left by an elementary matrix. So we can take the matrices corresponding to the list of all the elementary row operations and write

which tells us that applying all these elementary row operations one after another leads us to the identity matrix. But this means that the product of all the elementary matrices on the right is . And since we can also apply this to the transformation , we can find a list of elementary matrices whose product is . That is, any invertible linear transformation can be written as the product of a finite list of elementary matrices, and thus the elementary matrices generate the general linear group.

## Reduced Row Echelon Form

Let’s take row echelon form and push it a little further into a form that *is* unique: *reduced* row echelon form.

When we were doing Gaussian elimination we saved some steps by leaving pivoted rows alone. This was fine for the purposes of solving equations, where the point is just to eliminate progressively more and more variables in each equation. We also saved steps by leaving the pivots as whatever values they started with. Both of these led to a lot of variation in the possibilities for row echelon forms. We’ll tweak the method of Gaussian elimination to give a new algorithm called “Gauss-Jordan elimination”, which will take more steps but will remedy these problems.

First of all, when we pick a pivot and swap it into place, we scale that row by the inverse of the pivot value. This will set the pivot value itself to (which, incidentally, helps with the shears that come next). Then we shear to eliminate not only the nonzero entries below the pivot, but the nonzero entries *above* the pivot as well. That leaves the pivot as the only nonzero entry in its column.

Let’s see what this looks like in terms of our usual example:

We use the in the upper-left as a pivot and scale the top row by

Next, we clear out the first column with shears. Note that the shear value is now just the negative of the entry in the first column.

So far it’s almost the same as before, except for normalizing the pivot. But now when we choose the in the second column as the pivot we scale its row by

and use shears to clear the column above and below this pivot

Now the only choice for a pivot in the third column is . Again we normalize it and use shears to clear the column

This matrix is in reduced row echelon form, and I say that it is unique. Indeed, we are not allowed to alter the basis of the input space (since that would involve elementary *column* operations), so we can view this as a process of creating a basis for the output space in terms of the given basis of the input space.

What we do is walk down the input basis, vector by vector. At each step, we ask if the image of this basis vector is linearly independent of the images of those that came before. If so, this image is a new basis vector. If not, then we can write the image (uniquely) in terms of the output basis vectors we’ve already written down. In our example above, each of the first three input basis vectors gives a new output basis vector, and the image of the last input basis vector can be written as the column vector in the last column.

The only possibility of nonuniqueness is if we run out of input basis vectors before spanning the output space. But in that case the last rows of the matrix will consist of all zeroes anyway, and so it doesn’t matter what output basis vectors we choose for those rows! And so the reduced row echelon form of the matrix is uniquely determined by the input basis we started with.

## Solving Equations with Gaussian Elimination

The row echelon form of a matrix isn’t unique, but it’s still useful for solving systems of linear equations. If we have a system written in matrix form as

then applying elementary row operations builds up a new matrix that we multiply on the left of both sides

If the new matrix is in row echelon form, the resulting system of equations will be simple to solve.

A little more explicitly, this works because each of the equations is linear, and so each of the elementary row operations corresponds to some manipulation of the equations which gives rise to an equivalent system. for example, the order of the equations doesn’t matter, so swapping two of them (swapping two rows) doesn’t change the solution. Similarly, multiplying both sides of one equation by a nonzero number doesn’t change the solution to that equation, and so doesn’t change the solution of the system. Finally, adding two equations gives another valid equation. Since the three equations “overdetermine” the solution (they’re linearly dependent), we can drop one of the first two. The result is the same as a shear.

Since we’re just doing the same thing to the rows of the column vector as we are to the matrix , we may as well stack them beside each other. In this way, we get the “augmented matrix” of the system of equations. For example, we might have the system

with the augmented matrix

Then we can follow along our elimination from yesterday, applying the same steps to the system of equations. Foe example, if at this point we add the first and third equations we get , “eliminating” the variable from the equation. We replace the third equation with this new, simpler equation, giving the effect of a shear just like we did yesterday.

At the end, we found the row echelon form

which corresponds with the system

Now the equations are arranged like a ladder for us to climb up. First, we solve the bottom equation to find . Then we can plug this into the next equation up to find , which is easily solved to find . Then we plug both of these answers into the next equation up to find , which is easily solved to find , and we’re done. In fact, this is why we use the word “echelon” — French for a rung of a ladder — to describe this form.

What about our other example from yesterday? This corresponds to a system of *four* equations with three unknowns, since we’re interpreting the last column as the column vector on the right side of the equation. The system is

We convert this to an augmented matrix, put this matrix into row echelon form, and return it to the form of a system of equations to find

The empty row on the bottom is fine, since it just tells us the tautology that . Unfortunately, the line above that contains the contradiction that . That is, whenever the row echelon form of a system of equations has a leading coefficient in the last column, that system is unsolvable. We don’t even need to climb the ladder to know that.

On the other hand, if we started with the system

and put it in row echelon form (do it yourself) we get

Now we have no contradictions, but the system is underdetermined. Indeed, the rank is , so the kernel has dimension , and thus we’ll have one free parameter in our solution. It’s still easy to solve, though. We first find that . Plugging this into the first equation we find , which is easily solved to find .

So whether the system is exactly solvable, underdetermined, or contradictory, we can find whatever solutions may exist explicitly using a row echelon form.

## Row Echelon Form

For now, I only want to focus on elementary row operations. That is, transformations of matrices that can be effected by multiplying by elementary matrices on the left, not on the right.

Row echelon form is a (non-unique) simplification of the form of a matrix that can be reached by elementary row operations. A matrix is in row echelon form if all the empty (all-zero) rows are below all the nonzero rows, and the leftmost nonzero entry in a given row is to the right of the leftmost nonzero entry in the row above it. For example, an upper-triangular matrix is in row echelon form. We put a matrix into row echelon form by a method called “Gaussian elimination”, which always moves a matrix closer and closer to row echelon form.

First, we look at the first column. If it’s empty we leave it alone and move on the next column. If there are any nonzero entries, there must be only one of them at the top of the column, or the matrix can’t be in row echelon form. So, if this isn’t the case then we pick one of the nonzero entries and call it our “pivot”. Swap its whole row up to the top row, and then use shears to eliminate all the other nonzero entries in the column before moving on to the second column.

Before we continue, I’ll point out that the pivot is now the leading entry on a nonzero row at the top of the matrix. We’ll regard it as fixed, so whenever I say “all rows” I mean all the rows that we haven’t locked down yet.

Now that we’ve moved on to the next column, we again look for any nonzero entries. If there are none, we can move on. But if there are, we choose one to be a pivot, swap it to the top (below the previously locked rows) and use shears to remove all the other nonzero entries in the column (other than the locked rows). We continue like this, choosing pivots where there are nonzero entries to deal with, moving them to the top, and eliminating all the other nonzero entries below them, until we’ve locked down all the rows or run out of columns.

Let’s look at this method applied to the matrix

in the first column we have the nonzero value in the first row, so we may as well leave it where it is. We add times the first row to the second to eliminate the there, and add the first row to the third to eliminate the there. At this point our matrix looks like

Now in the second row we have at the top (since the very top row is finished). We leave it where it is and add times the second row to the third to eliminate the there. Now the matrix looks like

now there are no empty rows, so “all” of them are below all the nonzero rows. Further, within the nonzero rows the leading entries move from left to right as we move down the rows. The matrix is now in row echelon form.

Now let’s try this one:

Again, we’ll start with using the upper-left as our first pivot. We use shears to eliminate the rest of the entries in the first column and get

Now there are nonzero entries in the second column, but none of them are at the top. So we swap the second and third rows to bring a pivot to the top

and eliminate using a shear

the third column has no nonzero entries (other than the two locked-in entries at the top), so we skip it. In the fourth column we use as a pivot and eliminate, getting

Now the empty row is on the bottom, and the leading entries of the other rows move from left to right as we move down the rows. Thus, the matrix is now in row echelon form.