Nilpotent Transformations I
For today I want to consider a single transformation whose only eigenvalue is
. That is,
for sufficiently large powers
. We’ve seen that
is sufficiently large a power to take to check if
is nilpotent. We’ve also seen that
has an upper-triangular matrix with all zeroes along the diagonal — the single eigenvalue with multiplicity
. This sort of “nil-potent” transformation (because a power of it is the null transformation) is especially interesting because if we take any transformation, restrict it to a generalized eigenspace, and subtract the eigenvalue times the identity transformation, what’s left is nilpotent. This will be important soon.
The essential thing about is that its kernel gets bigger and bigger as we raise it to higher and higher powers, until it swallows up the whole vector space. This much we knew, but we’re going to follow it closely, so we can describe exactly how it slips away. What we need first is a lemma: Let
be the power where
first equals the zero transformation. If
is a subspace that intersects the kernel of
trivially — so
doesn’t kill off anything in
until the
th (and last) iteration — then we can build the subspace
. This subspace is invariant under
, since applying
just pushes everything down the line. The lemma will assert that there is another invariant subspace
so that
We’ll proceed by induction on . If
then
itself is the zero transformation. Every subspace of
is invariant under N$, and so we can always find such a
.
Now, let’s say that , and let
be a subspace so that
. Then some of
is taken up by
, and some by the kernel (with no overlap). We can find
another subspace so that
, and write
.
We can see that , since
applications of
are sufficient to kill off every vector. We can also see that
because if anything in
were killed off by
applications of
it would be in
, contradicting the direct sum above. Now if we restrict to
, we’re all set to invoke the inductive hypothesis. In this case,
plays the role of
and
plays the role of
. The inductive hypothesis gives us a subspace
that’s invariant under
and satisfying
We can set
to get a subspace of that is invariant under
(since
and the rest of the sum are each separately invariant). This will be the subspace we need, which we must now check. First, note that
but is the sum of and the rest direct? We need to show that their intersection is trivial.
Take elements through
of
, elements
through
of
, and
, and ask that
Applying to each side of this equation we find
— everything else gets killed off — which can only happen if
. Then we’re left with
which would imply
contradicting the directness of the sum in the above decomposition of unless everything in sight is zero.
So there it is. It’s sort of messy, but at the end of the day we can start with a subspace that doesn’t disappear for as long as possible and use
to march it around until it dies. Then the rest of
can be made up by another invariant subspace. Tomorrow we’ll see what we can do with this.
Generalized Eigenspaces are Disjoint
We know that eigenspaces of distinct eigenvalues are disjoint. That is, no vector is an eigenvector of two different eigenvalues. This extends to generalized eigenvectors as well. If we have a nonzero then we know for sure that
for
.
To see this, let’s say that we have a that’s in two generalized eigenspaces, and start hitting it with
over and over again. Eventually, it disappears, but before it does, it’s a nonzero eigenvector with eigenvalue
. I say that this (not generalized) eigenvector is also in the generalized eigenspace of
. Indeed, each time we hit
with
we write it as the sum
. Since
, the same is true of the scalar multiple
. And since
is a polynomial in
, its kernel is invariant. Thus we have
as well. And thus
is invariant under the transformation
, and we can assume (without loss of generality) that
is an actual eigenvector with eigenvalue
, as well as a generalized eigenvector with eigenvalue
.
But now let’s start hitting with
. Applying it once we find
. So no matter how many times we apply the transformation, we just keep multiplying by
, and we’ve assumed that this is nonzero. So
cannot possibly be in the generalized eigenspace of
. And so generalized eigenspaces corresponding to distinct eigenvalues can only intersect trivially.
Kernels of Polynomials of Transformations
When we considered the representation theory of the algebra of polynomials, we saw that all it takes to specify such a representation is choosing a single endomorphism . That is, once we pick a transformation
we get a whole algebra of transformations
, corresponding to polynomials
in one variable over the base field
. Today, I want to outline one useful fact about these: that their kernels are invariant subspaces under the action of
.
First, let’s remember what it means for a subspace to be invariant. This means that if we take a vector
then its image
is again in
. This generalizes the nice situation about eigenspaces: that we have some control (if not as complete) over the image of a vector.
So, we need to show that if then
, too. But since this is a representation, we can use the fact that
, because the polynomial algebra is commutative. Then we calculate
Thus if is a linear transformation which is built by evaluating a polynomial at
, then its kernel is an invariant subspace for
.
The Multiplicity of an Eigenvalue
We would like to define the multiplicity of an eigenvalue of a linear transformation
as the number of independent associated eigenvectors. That is, as the dimension of the kernel of
. Unfortunately, we saw that when we have repeated eigenvalues, sometimes this doesn’t quite capture the right notion. In that example, the
-eigenspace has dimension
, but it seems from the upper-triangular matrix that the eigenvalue
should have multiplicity
.
Indeed, we saw that if the entries along the diagonal of an upper-triangular matrix are , then the characteristic polynomial is
Then we can use our definition of multiplicity for roots of polynomials to see that a given value of has multiplicity equal to the number of times it shows up on the diagonal of an upper-triangular matrix.
It turns out that generalized eigenspaces do capture this notion, and we have a way of calculating them to boot! That is, I’m asserting that the multiplicity of an eigenvalue is both the number of times that
shows up on the diagonal of any upper-triangular matrix for
, and the number of independent generalized eigenvectors with eigenvalue
— which is
.
So, let’s fix a vector space of finite dimension
over an algebraically closed field
. Pick a linear transformation
and a basis
with respect to which the basis of
is upper-triangular. We know such a matrix will exist because we’re working over an algebraically closed base field. I’ll prove the assertion for the eigenvalue
— that the number of copies of
on the diagonal of the matrix is the dimension of the kernel of
— since for other eigenvalues we just replace
with
and do the exact same thing.
We’ll prove this statement by induction on the dimension of . The base case is easy: if
then the kernel of
has dimension
if the upper triangular matrix is
, and has dimension
for anything else.
For the inductive step, we’re interested in the subspace spanned by the basis vectors through
. Let’s call this subspace
. Now we can write out the matrix of
:
We can see that every vector in — linear combinations of
through
— lands back in
. Meanwhile
, where the components of
are given in the last column. The fact that
is invariant under the action of
means that we can restrict
to that subspace, getting the transformation
. Its matrix with respect to the obvious basis is
The dimension of is less than that of
, so we can use our inductive hypothesis to conclude that
shows up on the diagonal of this matrix
times. But we saw yesterday that the sequence of kernels of powers of
has stabilized by this point (since
has dimension
), so this is also
. The last diagonal entry of
is either
or not. If
, we want to show that
On the other hand, if , we want to show that
The inclusion-exclusion principle tells us that
Since this dimension of a subspace has to be less than or equal to , the difference in dimensions on the right can only be either zero or one. And we also see that
So if we need to show that every vector in
actually lies in
, so the difference in dimensions is zero. On the other hand, if
we need to find a vector in
that’s not in
, so the difference in dimensions has to be one.
The first case is easier. Any vector in but not in
can be written uniquely as
for some nonzero scalar
and some vector
. When we apply the transformation
, we get
. Since
, the coefficient of
is again nonzero. No matter how many times we apply
, we’ll still have a nonzero vector left. Thus the kernel of
is completely contained in
, and so we conclude
.
In the second case, let’s look for a vector of the form . We want to choose
so that
. At the first application of
we find
. Thus
But the dimension of is
, and so by this point the sequence of images of powers of
has stabilized! That is,
and so we can find a so that
. This gives us a vector in the kernel of
that doesn’t lie in
, and the inductive step is complete.
As a final remark, notice that the only place we really used the fact that is algebraically closed is when we picked a basis that would make
upper-triangular. Everything still goes through as long as we have an upper-triangular matrix, but a given linear transformation may have no such matrix.
Images of Powers of Transformations
For some technical points, it’s going to be useful to have a sort of dual to the increasing chain of subspaces we found yesterday. Instead of kernels, we’ll deal with images.
Specifically, if then I say
. Indeed, the first statement asserts that there is some
so that
. But then
, and so it’s the image of
under
as well. So we have a decreasing sequence
Just like last time, these stabilize by the time we get to the th power, where
. Instead of repeating everything, let’s just use the rank-nullity theorem, which says for each power
that
. Now if
then we calculate
where in the second line we used the stability of the sequence of kernels from yesterday. This tells us that for all these higher powers of
.
Determining Generalized Eigenvectors
Our definition of a generalized eigenvector looks a lot like the one for an eigenvector. But finding them may not be as straightforward as our method for finding eigenvectors. In particular, we’re asking that the vector be in the kernel of not one transformation, but some transformation in a whole infinite list of them. We can’t just go ahead and apply each transformation, hoping to find one that sends our vector to zero.
Maybe the form of this list can help us. We’re really just taking the one transformation and applying it over and over again. So we could start with
and calculate
, and then
, and so on, until we end up with
. That’s all well and good if
is a generalized eigenvector, but what if it isn’t? At what point do we stop and say we’re never going to get to zero?
The first thing we have to notice is that as we go along the list of transformations, the kernel never shrinks. That is, if then
Thus, we have an increasing sequence of subspaces
Next we have to recognize that this sequence is strictly increasing until it levels out. That is, if then
. Then, of course, we can use an inductive argument to see that all the kernels from that point on are the same. But why does this happen? Well, let’s say that
implies that
(the other implication we’ve already taken care of above). Then we can see that
implies that
by rewriting them:
where we have used the assumed implication between the second and third lines.
So once this sequence stops growing at one step, it never grows again. That is, if the kernels ever stabilize the sequence looks like
and is as large as it ever gets.
So does the sequence top out? Of course, it has to! Indeed, each step before it stops growing raises the dimension by at least one, so if it didn’t stop by step it would get bigger than the space
itself, which is absurd because these are all subspaces of
. So
is the largest of these kernels.
Where does this leave us? We’ve established that if is in the kernel of any power of
it will be in
. Thus the space of generalized eigenvectors with eigenvalue
is exactly this kernel. Now finding generalized eigenvectors is just as easy as finding eigenvectors.
Generalized Eigenvectors
Sorry for the delay, but exam time is upon us, or at least my college algebra class.
Anyhow, we’ve established that distinct eigenvalues allow us to diagonalize a matrix, but repeated eigenvalues cause us problems. We need to generalize the concept of eigenvectors somewhat.
First of all, since an eigenspace generalizes a kernel, let’s consider a situation where we repeat the eigenvalue :
This kills off the vector right away. But the vector
gets sent to
, where it can be killed by a second application of the matrix. So while there may not be two independent eigenvectors with eigenvalue
, there can be another vector that is eventually killed off by repeated applications of the matrix.
More generally, consider a strictly upper-triangular matrix, all of whose diagonal entries are zero as well:
That is, for all
. What happens as we compose this matrix with itself? I say that for
we’ll find the
entry to be zero for all
. Indeed, we can calculate it as a sum of terms like
. For each of these factors to be nonzero we need
and
. That is,
, or else the matrix entry must be zero. Similarly, every additional factor of
pushes the nonzero matrix entries one step further from the diagonal, and eventually they must fall off the upper-right corner. That is, some power of
must give the zero matrix. The vectors may not have been killed by the transformation
, so they may not all have been in the kernel, but they will all be in the kernel of some power of
.
Similarly, let’s take a linear transformation and a vector
. If
we said that
is an eigenvector of
with eigenvalue
. Now we’ll extend this by saying that if
for some
, then
is a generalized eigenvector of
with eigenvalue
.
Repeated Eigenvalues
So we’ve got a linear transformation on a vector space
of finite dimension
. We take its characteristic polynomial to find the eigenvalues. If all of its roots are distinct (and there are
of them, as there must be if we’re working over an algebraically closed field) then we can pick a basis of eigenvectors and get a diagonal matrix for
.
But what if a root is repeated? Say the characteristic polynomial has a factor of . We might think to pick two linearly independent eigenvectors with eigenvalue
, but unfortunately that doesn’t always work. Consider the transformation given by the matrix
This matrix is upper-triangular, and so we can just read off its eigenvalues from the diagonal: two copies of the eigenvalue . We can easily calculate its action on a vector
So the eigenvector equation says that and
. The second is trivial, and the first tells us that
. But we can only pick one linearly independent vector of this form. So we can’t find a basis of eigenvectors, and we can’t diagonalize this matrix. We’re going to need another approach.
Transformations with All Eigenvalues Distinct
When will a linear transformation have a diagonal matrix? We know that this happens exactly when we can find a basis
of our vector space
so that each basis vector is an eigenvector
. But when does this happen? The full answer can be complicated, but one sufficient condition we can describe right away.
First, remember that we can always calculate the characteristic polynomial of . This will be a polynomial of degree
— the dimension of
. Further, we know that a field element
is an eigenvalue of
if and only if
is a root of the characteristic polynomial. Since the degree of the polynomial is
, we can expect to find no more than
distinct roots of the polynomial — no more than
distinct eigenvalues of
. I want to consider what happens in this generic case.
Now we have field elements
, and for each one we can pick a vector
so that
. I say that they form a basis of
. If we can show that they are linearly independent, then they span some subspace of
. But since there are
distinct vectors here, they span a subspace of dimension
, which must be all of
. And we know, by an earlier lemma that a collection of eigenvectors corresponding to distinct eigenvalues must be linearly independent! Thus, the
form a basis for
consisting of eigenvectors of
. With respect to this basis, the matrix of
is diagonal.
This is good, but notice that there are still plenty of things that can go wrong. It’s entirely possible that two (or more!) of the eigenvalues are not distinct. Worse, we could be working over a field that isn’t algebraically closed, so there may not be roots at all, even counting duplicates. But still, in the generic case we’ve got a diagonal matrix with respect to a well-chosen basis.
Diagonal Matrices
Even better than upper-triangular matrices are diagonal matrices. These ones look like
Each basis vector is an eigenvector, with the eigenvalues listed down the diagonal. It’s straightforward to show that the sum and product of two diagonal matrices are themselves diagonal. Thus, diagonal matrices form a further subalgebra inside the algebra of upper-triangular matrices. This algebra is just a direct sum of copies of , with multiplication defined component-by-component.
Diagonal matrices are especially nice because it’s really easy to see how they act on vectors. Given a diagonal matrix , break a vector
into its components
. Multiply each component by the corresponding eigenvalue
. And you’re done! Composing a diagonal matrix with another matrix is also easy. To find
, just multiply each row of
by the corresponding eigenvalue. To find
, multiply each column of
by the corresponding diagonal.
So, if we can find a basis for our vector space consisting only of eigenvectors for the transformation , then with respect to that basis the matrix of
is diagonal. This is as good as we can hope for, and a lot of linear algebra comes down to determining when we can do this.