## Nilpotent Transformations I

For today I want to consider a single transformation whose only eigenvalue is . That is, for sufficiently large powers . We’ve seen that is sufficiently large a power to take to check if is nilpotent. We’ve also seen that has an upper-triangular matrix with all zeroes along the diagonal — the single eigenvalue with multiplicity . This sort of “nil-potent” transformation (because a power of it is the null transformation) is especially interesting because if we take any transformation, restrict it to a generalized eigenspace, and subtract the eigenvalue times the identity transformation, what’s left is nilpotent. This will be important soon.

The essential thing about is that its kernel gets bigger and bigger as we raise it to higher and higher powers, until it swallows up the whole vector space. This much we knew, but we’re going to follow it closely, so we can describe exactly how it slips away. What we need first is a lemma: Let be the power where first equals the zero transformation. If is a subspace that intersects the kernel of trivially — so doesn’t kill off anything in until the th (and last) iteration — then we can build the subspace . This subspace is invariant under , since applying just pushes everything down the line. The lemma will assert that there is another invariant subspace so that

We’ll proceed by induction on . If then itself is the zero transformation. Every subspace of is invariant under N$, and so we can always find such a .

Now, let’s say that , and let be a subspace so that . Then some of is taken up by , and some by the kernel (with no overlap). We can find another subspace so that , and write .

We can see that , since applications of are sufficient to kill off every vector. We can also see that because if anything in were killed off by applications of it would be in , contradicting the direct sum above. Now if we restrict to , we’re all set to invoke the inductive hypothesis. In this case, plays the role of and plays the role of . The inductive hypothesis gives us a subspace that’s invariant under and satisfying

We can set

to get a subspace of that is invariant under (since and the rest of the sum are each separately invariant). This will be the subspace we need, which we must now check. First, note that

but is the sum of and the rest direct? We need to show that their intersection is trivial.

Take elements through of , elements through of , and , and ask that

Applying to each side of this equation we find — everything else gets killed off — which can only happen if . Then we’re left with

which would imply

contradicting the directness of the sum in the above decomposition of unless everything in sight is zero.

So there it is. It’s sort of messy, but at the end of the day we can start with a subspace that doesn’t disappear for as long as possible and use to march it around until it dies. Then the rest of can be made up by another invariant subspace. Tomorrow we’ll see what we can do with this.

## Generalized Eigenspaces are Disjoint

We know that eigenspaces of distinct eigenvalues are disjoint. That is, no vector is an eigenvector of two different eigenvalues. This extends to generalized eigenvectors as well. If we have a nonzero then we know for sure that for .

To see this, let’s say that we have a that’s in two generalized eigenspaces, and start hitting it with over and over again. Eventually, it disappears, but before it does, it’s a nonzero eigenvector with eigenvalue . I say that this (not generalized) eigenvector is *also* in the generalized eigenspace of . Indeed, each time we hit with we write it as the sum . Since , the same is true of the scalar multiple . And since is a polynomial in , its kernel is invariant. Thus we have as well. And thus is invariant under the transformation , and we can assume (without loss of generality) that is an actual eigenvector with eigenvalue , as well as a generalized eigenvector with eigenvalue .

But now let’s start hitting with . Applying it once we find . So no matter how many times we apply the transformation, we just keep multiplying by , and we’ve assumed that this is nonzero. So cannot possibly be in the generalized eigenspace of . And so generalized eigenspaces corresponding to distinct eigenvalues can only intersect trivially.

## Kernels of Polynomials of Transformations

When we considered the representation theory of the algebra of polynomials, we saw that all it takes to specify such a representation is choosing a single endomorphism . That is, once we pick a transformation we get a whole algebra of transformations , corresponding to polynomials in one variable over the base field . Today, I want to outline one useful fact about these: that their kernels are invariant subspaces under the action of .

First, let’s remember what it means for a subspace to be invariant. This means that if we take a vector then its image is again in . This generalizes the nice situation about eigenspaces: that we have some control (if not as complete) over the image of a vector.

So, we need to show that if then , too. But since this is a representation, we can use the fact that , because the polynomial algebra is commutative. Then we calculate

Thus if is a linear transformation which is built by evaluating a polynomial at , then its kernel is an invariant subspace for .

## The Multiplicity of an Eigenvalue

We would like to define the multiplicity of an eigenvalue of a linear transformation as the number of independent associated eigenvectors. That is, as the dimension of the kernel of . Unfortunately, we saw that when we have repeated eigenvalues, sometimes this doesn’t quite capture the right notion. In that example, the -eigenspace has dimension , but it seems from the upper-triangular matrix that the eigenvalue should have multiplicity .

Indeed, we saw that if the entries along the diagonal of an upper-triangular matrix are , then the characteristic polynomial is

Then we can use our definition of multiplicity for roots of polynomials to see that a given value of has multiplicity equal to the number of times it shows up on the diagonal of an upper-triangular matrix.

It turns out that generalized eigenspaces *do* capture this notion, and we have a way of calculating them to boot! That is, I’m asserting that the multiplicity of an eigenvalue is both the number of times that shows up on the diagonal of any upper-triangular matrix for , and the number of independent generalized eigenvectors with eigenvalue — which is .

So, let’s fix a vector space of finite dimension over an algebraically closed field . Pick a linear transformation and a basis with respect to which the basis of is upper-triangular. We know such a matrix will exist because we’re working over an algebraically closed base field. I’ll prove the assertion for the eigenvalue — that the number of copies of on the diagonal of the matrix is the dimension of the kernel of — since for other eigenvalues we just replace with and do the exact same thing.

We’ll prove this statement by induction on the dimension of . The base case is easy: if then the kernel of has dimension if the upper triangular matrix is , and has dimension for anything else.

For the inductive step, we’re interested in the subspace spanned by the basis vectors through . Let’s call this subspace . Now we can write out the matrix of :

We can see that every vector in — linear combinations of through — lands back in . Meanwhile , where the components of are given in the last column. The fact that is invariant under the action of means that we can restrict to that subspace, getting the transformation . Its matrix with respect to the obvious basis is

The dimension of is less than that of , so we can use our inductive hypothesis to conclude that shows up on the diagonal of this matrix times. But we saw yesterday that the sequence of kernels of powers of has stabilized by this point (since has dimension ), so this is also . The last diagonal entry of is either or not. If , we want to show that

On the other hand, if , we want to show that

The inclusion-exclusion principle tells us that

Since this dimension of a subspace has to be less than or equal to , the difference in dimensions on the right can only be either zero or one. And we also see that

So if we need to show that every vector in actually lies in , so the difference in dimensions is zero. On the other hand, if we need to find a vector in that’s *not* in , so the difference in dimensions has to be one.

The first case is easier. Any vector in but not in can be written uniquely as for some nonzero scalar and some vector . When we apply the transformation , we get . Since , the coefficient of is again nonzero. No matter how many times we apply , we’ll still have a nonzero vector left. Thus the kernel of is completely contained in , and so we conclude .

In the second case, let’s look for a vector of the form . We want to choose so that . At the first application of we find . Thus

But the dimension of is , and so by this point the sequence of images of powers of has stabilized! That is,

and so we *can* find a so that . This gives us a vector in the kernel of that doesn’t lie in , and the inductive step is complete.

As a final remark, notice that the only place we really used the fact that is algebraically closed is when we picked a basis that would make upper-triangular. Everything still goes through as long as we have an upper-triangular matrix, but a given linear transformation may have no such matrix.

## Images of Powers of Transformations

For some technical points, it’s going to be useful to have a sort of dual to the increasing chain of subspaces we found yesterday. Instead of kernels, we’ll deal with images.

Specifically, if then I say . Indeed, the first statement asserts that there is some so that . But then , and so it’s the image of under as well. So we have a decreasing sequence

Just like last time, these stabilize by the time we get to the th power, where . Instead of repeating everything, let’s just use the rank-nullity theorem, which says for each power that . Now if then we calculate

where in the second line we used the stability of the sequence of kernels from yesterday. This tells us that for all these higher powers of .

## Determining Generalized Eigenvectors

Our definition of a generalized eigenvector looks a lot like the one for an eigenvector. But finding them may not be as straightforward as our method for finding eigenvectors. In particular, we’re asking that the vector be in the kernel of not one transformation, but some transformation in a whole infinite list of them. We can’t just go ahead and apply each transformation, hoping to find one that sends our vector to zero.

Maybe the form of this list can help us. We’re really just taking the one transformation and applying it over and over again. So we could start with and calculate , and then , and so on, until we end up with . That’s all well and good if *is* a generalized eigenvector, but what if it isn’t? At what point do we stop and say we’re never going to get to zero?

The first thing we have to notice is that as we go along the list of transformations, the kernel never shrinks. That is, if then

Thus, we have an increasing sequence of subspaces

Next we have to recognize that this sequence is *strictly* increasing until it levels out. That is, if then . Then, of course, we can use an inductive argument to see that all the kernels from that point on are the same. But why does this happen? Well, let’s say that implies that (the other implication we’ve already taken care of above). Then we can see that implies that by rewriting them:

where we have used the assumed implication between the second and third lines.

So once this sequence stops growing at one step, it never grows again. That is, if the kernels ever stabilize the sequence looks like

and is as large as it ever gets.

So does the sequence top out? Of course, it has to! Indeed, each step before it stops growing raises the dimension by at least one, so if it didn’t stop by step it would get bigger than the space itself, which is absurd because these are all *sub*spaces of . So is the largest of these kernels.

Where does this leave us? We’ve established that if is in the kernel of *any* power of it will be in . Thus the space of generalized eigenvectors with eigenvalue is exactly this kernel. Now finding generalized eigenvectors is just as easy as finding eigenvectors.

## Generalized Eigenvectors

Sorry for the delay, but exam time is upon us, or at least my college algebra class.

Anyhow, we’ve established that distinct eigenvalues allow us to diagonalize a matrix, but repeated eigenvalues cause us problems. We need to generalize the concept of eigenvectors somewhat.

First of all, since an eigenspace generalizes a kernel, let’s consider a situation where we repeat the eigenvalue :

This kills off the vector right away. But the vector gets sent to , where it can be killed by a *second* application of the matrix. So while there may not be two independent eigenvectors with eigenvalue , there can be another vector that is *eventually* killed off by repeated applications of the matrix.

More generally, consider a *strictly* upper-triangular matrix, all of whose diagonal entries are zero as well:

That is, for all . What happens as we compose this matrix with itself? I say that for we’ll find the entry to be zero for all . Indeed, we can calculate it as a sum of terms like . For each of these factors to be nonzero we need and . That is, , or else the matrix entry must be zero. Similarly, every additional factor of pushes the nonzero matrix entries one step further from the diagonal, and eventually they must fall off the upper-right corner. That is, *some* power of must give the zero matrix. The vectors may not have been killed by the transformation , so they may not all have been in the kernel, but they will all be in the kernel of some power of .

Similarly, let’s take a linear transformation and a vector . If we said that is an eigenvector of with eigenvalue . Now we’ll extend this by saying that if for some , then is a *generalized* eigenvector of with eigenvalue .

## Repeated Eigenvalues

So we’ve got a linear transformation on a vector space of finite dimension . We take its characteristic polynomial to find the eigenvalues. If all of its roots are distinct (and there are of them, as there must be if we’re working over an algebraically closed field) then we can pick a basis of eigenvectors and get a diagonal matrix for .

But what if a root is repeated? Say the characteristic polynomial has a factor of . We might think to pick two linearly independent eigenvectors with eigenvalue , but unfortunately that doesn’t always work. Consider the transformation given by the matrix

This matrix is upper-triangular, and so we can just read off its eigenvalues from the diagonal: two copies of the eigenvalue . We can easily calculate its action on a vector

So the eigenvector equation says that and . The second is trivial, and the first tells us that . But we can only pick one linearly independent vector of this form. So we can’t find a basis of eigenvectors, and we can’t diagonalize this matrix. We’re going to need another approach.

## Transformations with All Eigenvalues Distinct

When will a linear transformation have a diagonal matrix? We know that this happens exactly when we can find a basis of our vector space so that each basis vector is an eigenvector . But when does *this* happen? The full answer can be complicated, but one sufficient condition we can describe right away.

First, remember that we can always calculate the characteristic polynomial of . This will be a polynomial of degree — the dimension of . Further, we know that a field element is an eigenvalue of if and only if is a root of the characteristic polynomial. Since the degree of the polynomial is , we can expect to find no more than distinct roots of the polynomial — no more than distinct eigenvalues of . I want to consider what happens in this generic case.

Now we have field elements , and for each one we can pick a vector so that . I say that they form a basis of . If we can show that they are linearly independent, then they span some subspace of . But since there are distinct vectors here, they span a subspace of dimension , which must be all of . And we know, by an earlier lemma that a collection of eigenvectors corresponding to distinct eigenvalues must be linearly independent! Thus, the form a basis for consisting of eigenvectors of . With respect to this basis, the matrix of is diagonal.

This is good, but notice that there are still plenty of things that can go wrong. It’s entirely possible that two (or more!) of the eigenvalues are not distinct. Worse, we could be working over a field that isn’t algebraically closed, so there may not be roots at all, even counting duplicates. But still, in the generic case we’ve got a diagonal matrix with respect to a well-chosen basis.

## Diagonal Matrices

Even better than upper-triangular matrices are diagonal matrices. These ones look like

Each basis vector is an eigenvector, with the eigenvalues listed down the diagonal. It’s straightforward to show that the sum and product of two diagonal matrices are themselves diagonal. Thus, diagonal matrices form a further subalgebra inside the algebra of upper-triangular matrices. This algebra is just a direct sum of copies of , with multiplication defined component-by-component.

Diagonal matrices are especially nice because it’s really easy to see how they act on vectors. Given a diagonal matrix , break a vector into its components . Multiply each component by the corresponding eigenvalue . And you’re done! Composing a diagonal matrix with another matrix is also easy. To find , just multiply each row of by the corresponding eigenvalue. To find , multiply each column of by the corresponding diagonal.

So, if we can find a basis for our vector space consisting only of eigenvectors for the transformation , then with respect to that basis the matrix of is diagonal. This is as good as we can hope for, and a lot of linear algebra comes down to determining when we can do this.