The Unapologetic Mathematician

Mathematics for the interested outsider

Nilpotent Transformations I

For today I want to consider a single transformation N:V\rightarrow V whose only eigenvalue is {0}. That is, N^n=0 for sufficiently large powers n. We’ve seen that d=\dim(V) is sufficiently large a power to take to check if N is nilpotent. We’ve also seen that N has an upper-triangular matrix with all zeroes along the diagonal — the single eigenvalue with multiplicity d. This sort of “nil-potent” transformation (because a power of it is the null transformation) is especially interesting because if we take any transformation, restrict it to a generalized eigenspace, and subtract the eigenvalue times the identity transformation, what’s left is nilpotent. This will be important soon.

The essential thing about N is that its kernel gets bigger and bigger as we raise it to higher and higher powers, until it swallows up the whole vector space. This much we knew, but we’re going to follow it closely, so we can describe exactly how it slips away. What we need first is a lemma: Let k be the power where N^k first equals the zero transformation. If U\subseteq V is a subspace that intersects the kernel of N^{k-1} trivially — so N doesn’t kill off anything in U until the kth (and last) iteration — then we can build the subspace U+NU+N^2U+...+N^{k-1}U. This subspace is invariant under N, since applying N just pushes everything down the line. The lemma will assert that there is another invariant subspace W so that

\displaystyle V=(U+NU+...+N^{k-1}U)\oplus W

We’ll proceed by induction on k. If k=1 then N itself is the zero transformation. Every subspace of V is invariant under N$, and so we can always find such a W.

Now, let’s say that k>1, and let U\subseteq V be a subspace so that U\cap\mathrm{Ker}(N^{k-1})=0. Then some of V is taken up by U, and some by the kernel (with no overlap). We can find U'\subseteq V another subspace so that V=U'\oplus U\oplus\mathrm{Ker}(N^{k-1}), and write U''=U'\oplus U.

We can see that NU''\subseteq\mathrm{Ker}(N^{k-1}), since k applications of N are sufficient to kill off every vector. We can also see that NU''\cap\mathrm{Ker}(N^{k-2})=0 because if anything in U'' were killed off by 1+(k-2) applications of N it would be in \mathrm{Ker}(N^{k-1}), contradicting the direct sum above. Now if we restrict to \mathrm{Ker}(N^{k-1}), we’re all set to invoke the inductive hypothesis. In this case, \mathrm{Ker}(N^{k-1}) plays the role of V and NU'' plays the role of U. The inductive hypothesis gives us a subspace W'\subseteq\mathrm{Ker}(N^{k-1}) that’s invariant under N and satisfying

\displaystyle\mathrm{Ker}(N^{k-1})=(NU''+N^2U''+...+N^{k-1}U'')\oplus W'

We can set

\displaystyle W=W'+U'+NU'+...+N^{k-1}U'

to get a subspace of V that is invariant under N (since W' and the rest of the sum are each separately invariant). This will be the subspace we need, which we must now check. First, note that


but is the sum of W and the rest direct? We need to show that their intersection is trivial.

Take elements u_0 through u_{k-1} of U, elements u'_0 through u'_{k-1} of U', and w'\in W', and ask that

\displaystyle u_0+Nu_1+...+N^{k-1}u_{k-1}=u'_0+Nu'_1+...+N^{k-1}u'_{k-1}+w'

Applying N^{k-1} to each side of this equation we find N^{k-1}(u_0-u'_0)=0 — everything else gets killed off — which can only happen if u_0=u'_0=0. Then we’re left with

\displaystyle Nu_1+...+N^{k-1}u_{k-1}=Nu'_1+...+N^{k-1}u'_{k-1}+w'

which would imply

\displaystyle N(u_1-u'_1)+...+N^{k-1}(u_{k-1}-u'_{k-1})=w'

contradicting the directness of the sum in the above decomposition of \mathrm{Ker}(N^{k-1}) unless everything in sight is zero.

So there it is. It’s sort of messy, but at the end of the day we can start with a subspace U that doesn’t disappear for as long as possible and use N to march it around until it dies. Then the rest of V can be made up by another invariant subspace. Tomorrow we’ll see what we can do with this.

February 26, 2009 Posted by | Algebra, Linear Algebra | 6 Comments

Generalized Eigenspaces are Disjoint

We know that eigenspaces of distinct eigenvalues are disjoint. That is, no vector is an eigenvector of two different eigenvalues. This extends to generalized eigenvectors as well. If we have a nonzero v\in\mathrm{Ker}\left((T-\lambda_11_V)^d\right) then we know for sure that v\notin\mathrm{Ker}\left((T-\lambda_21_V)^d\right) for \lambda_2\neq\lambda_1.

To see this, let’s say that we have a v that’s in two generalized eigenspaces, and start hitting it with T-\lambda_11_V over and over again. Eventually, it disappears, but before it does, it’s a nonzero eigenvector with eigenvalue \lambda_1. I say that this (not generalized) eigenvector is also in the generalized eigenspace of \lambda_2. Indeed, each time we hit v with T-\lambda_11_V we write it as the sum T(v)-\lambda_1v. Since v\in\mathrm{Ker}\left((T-\lambda_21_V)^d\right), the same is true of the scalar multiple \lambda_1v\in\mathrm{Ker}\left((T-\lambda_21_V)^d\right). And since (T-\lambda_2)^d is a polynomial in T, its kernel is invariant. Thus we have T(v)\in\mathrm{Ker}\left((T-\lambda_21_V)^d\right) as well. And thus \mathrm{Ker}\left((T-\lambda_21_V)^d\right) is invariant under the transformation T-\lambda_11_V, and we can assume (without loss of generality) that v is an actual eigenvector with eigenvalue \lambda_1, as well as a generalized eigenvector with eigenvalue \lambda_2.

But now let’s start hitting v with T-\lambda_21_V. Applying it once we find T(v)-\lambda_2v=(\lambda_1-\lambda_2)v. So no matter how many times we apply the transformation, we just keep multiplying by \lambda_1-\lambda_2, and we’ve assumed that this is nonzero. So v cannot possibly be in the generalized eigenspace of \lambda_2. And so generalized eigenspaces corresponding to distinct eigenvalues can only intersect trivially.

February 25, 2009 Posted by | Algebra, Linear Algebra | 3 Comments

Kernels of Polynomials of Transformations

When we considered the representation theory of the algebra of polynomials, we saw that all it takes to specify such a representation is choosing a single endomorphism T:V\rightarrow V. That is, once we pick a transformation T we get a whole algebra of transformations p(T), corresponding to polynomials p in one variable over the base field \mathbb{F}. Today, I want to outline one useful fact about these: that their kernels are invariant subspaces under the action of T.

First, let’s remember what it means for a subspace U\subseteq V to be invariant. This means that if we take a vector u\in U then its image T(u) is again in U. This generalizes the nice situation about eigenspaces: that we have some control (if not as complete) over the image of a vector.

So, we need to show that if \left[p(T)\right](u)=0 then \left[p(T)\right]\left(T(u)\right)=0, too. But since this is a representation, we can use the fact that p(X)X=Xp(X), because the polynomial algebra is commutative. Then we calculate


Thus if p(T) is a linear transformation which is built by evaluating a polynomial at T, then its kernel is an invariant subspace for T.

February 24, 2009 Posted by | Algebra, Linear Algebra | 6 Comments

The Multiplicity of an Eigenvalue

We would like to define the multiplicity of an eigenvalue \lambda of a linear transformation T as the number of independent associated eigenvectors. That is, as the dimension of the kernel of T-\lambda1_V. Unfortunately, we saw that when we have repeated eigenvalues, sometimes this doesn’t quite capture the right notion. In that example, the {1}-eigenspace has dimension {1}, but it seems from the upper-triangular matrix that the eigenvalue {1} should have multiplicity {2}.

Indeed, we saw that if the entries along the diagonal of an upper-triangular matrix are \lambda_1,...,\lambda_d, then the characteristic polynomial is


Then we can use our definition of multiplicity for roots of polynomials to see that a given value of \lambda has multiplicity equal to the number of times it shows up on the diagonal of an upper-triangular matrix.

It turns out that generalized eigenspaces do capture this notion, and we have a way of calculating them to boot! That is, I’m asserting that the multiplicity of an eigenvalue \lambda is both the number of times that \lambda shows up on the diagonal of any upper-triangular matrix for T, and the number of independent generalized eigenvectors with eigenvalue \lambda — which is \dim\mathrm{Ker}\left((T-\lambda1_V)^{\dim(V)}\right).

So, let’s fix a vector space V of finite dimension D over an algebraically closed field \mathbb{F}. Pick a linear transformation T:V\rightarrow V and a basis \left\{e_i\right\} with respect to which the basis of T is upper-triangular. We know such a matrix will exist because we’re working over an algebraically closed base field. I’ll prove the assertion for the eigenvalue \lambda=0 — that the number of copies of {0} on the diagonal of the matrix is the dimension of the kernel of T^d — since for other eigenvalues we just replace T with T-\lambda1_V and do the exact same thing.

We’ll prove this statement by induction on the dimension of V. The base case is easy: if \dim(V)=1 then the kernel of T has dimension {1} if the upper triangular matrix is \begin{pmatrix}{0}\end{pmatrix}, and has dimension {0} for anything else.

For the inductive step, we’re interested in the subspace spanned by the basis vectors e_1 through e_{d-1}. Let’s call this subspace U. Now we can write out the matrix of T:


We can see that every vector in U — linear combinations of e_1 through e_{d-1} — lands back in U. Meanwhile T(e_d)=\lambda_de_d+\bar{u}, where the components of \bar{u}\in U are given in the last column. The fact that U is invariant under the action of T means that we can restrict T to that subspace, getting the transformation T|_U:U\rightarrow U. Its matrix with respect to the obvious basis is


The dimension of U is less than that of V, so we can use our inductive hypothesis to conclude that {0} shows up on the diagonal of this matrix \dim\left(\mathrm{Ker}\left((T|_U)^{d-1}\right)\right) times. But we saw yesterday that the sequence of kernels of powers of T|_U has stabilized by this point (since U has dimension d-1), so this is also \dim\left(\mathrm{Ker}\left((T|_U)^d\right)\right). The last diagonal entry of T is either {0} or not. If \lambda_d\neq0, we want to show that


On the other hand, if \lambda_d=0, we want to show that


The inclusion-exclusion principle tells us that


Since this dimension of a subspace has to be less than or equal to d, the difference in dimensions on the right can only be either zero or one. And we also see that


So if \lambda\neq0 we need to show that every vector in \mathrm{Ker}(T^d) actually lies in U, so the difference in dimensions is zero. On the other hand, if \lambda=0 we need to find a vector in \mathrm{Ker}(T^d) that’s not in U, so the difference in dimensions has to be one.

The first case is easier. Any vector in V but not in U can be written uniquely as u+xe_d for some nonzero scalar x\in\mathbb{F} and some vector u\in U. When we apply the transformation T, we get T(u)+x\bar{u}+x\lambda_de_d. Since \lambda_d\neq0, the coefficient of e_d is again nonzero. No matter how many times we apply T, we’ll still have a nonzero vector left. Thus the kernel of T^d is completely contained in U, and so we conclude \mathrm{Ker}(T^d)=\mathrm{Ker}\left((T|_U)^d\right).

In the second case, let’s look for a vector of the form u-e_d. We want to choose u\in U so that T^d(u)=T^d(e_d). At the first application of T we find T(e_d)=\bar{u}\in U. Thus

\displaystyle T^d(e_d)=T^{d-1}(\bar{u})\in\mathrm{Im}\left(\left(T|_U\right)^{d-1}\right)

But the dimension of U is d-1, and so by this point the sequence of images of powers of T|_U has stabilized! That is,

\displaystyle T^d(e_d)\in\mathrm{Im}\left(\left(T|_U\right)^{d-1}\right)\subseteq\mathrm{Im}\left(\left(T|_U\right)^d\right)

and so we can find a u so that T^d(u)=T^d(e_d). This gives us a vector in the kernel of T that doesn’t lie in U, and the inductive step is complete.

As a final remark, notice that the only place we really used the fact that \mathbb{F} is algebraically closed is when we picked a basis that would make T upper-triangular. Everything still goes through as long as we have an upper-triangular matrix, but a given linear transformation may have no such matrix.

February 19, 2009 Posted by | Algebra, Linear Algebra | 10 Comments

Images of Powers of Transformations

For some technical points, it’s going to be useful to have a sort of dual to the increasing chain of subspaces we found yesterday. Instead of kernels, we’ll deal with images.

Specifically, if w\in\mathrm{Im}(T^{k+1}) then I say w\in\mathrm{Im}(T^k). Indeed, the first statement asserts that there is some v so that w=T^{k+1}(v). But then w=T^k(T(v)), and so it’s the image of T(v) under T^k as well. So we have a decreasing sequence

\displaystyle V=\mathrm{Im}(T^0)\supseteq\mathrm{Im}(T^1)\supseteq\mathrm{Im}(T^2)\supseteq...

Just like last time, these stabilize by the time we get to the dth power, where d=\dim(V). Instead of repeating everything, let’s just use the rank-nullity theorem, which says for each power n that d=\dim\left(\mathrm{Ker}(T^n)\right)+\dim\left(\mathrm{Im}(T^n)\right). Now if m>d=\dim(V) then we calculate


where in the second line we used the stability of the sequence of kernels from yesterday. This tells us that \mathrm{Im}(T^m)=\mathrm{Im}(T^d) for all these higher powers of T.

February 18, 2009 Posted by | Algebra, Linear Algebra | 2 Comments

Determining Generalized Eigenvectors

Our definition of a generalized eigenvector looks a lot like the one for an eigenvector. But finding them may not be as straightforward as our method for finding eigenvectors. In particular, we’re asking that the vector be in the kernel of not one transformation, but some transformation in a whole infinite list of them. We can’t just go ahead and apply each transformation, hoping to find one that sends our vector to zero.

Maybe the form of this list can help us. We’re really just taking the one transformation T-\lambda1_V and applying it over and over again. So we could start with v and calculate (T-\lambda1_V)v, and then (T-\lambda1_V)^2v, and so on, until we end up with (T-\lambda1_V)^nv=0. That’s all well and good if v is a generalized eigenvector, but what if it isn’t? At what point do we stop and say we’re never going to get to zero?

The first thing we have to notice is that as we go along the list of transformations, the kernel never shrinks. That is, if (T-\lambda1_V)^iv=0 then


Thus, we have an increasing sequence of subspaces

\displaystyle 0=\mathrm{Ker}\left((T-\lambda1_V)^0\right)\subseteq\mathrm{Ker}\left((T-\lambda1_V)^1\right)\subseteq\mathrm{Ker}\left((T-\lambda1_V)^2\right)\subseteq...

Next we have to recognize that this sequence is strictly increasing until it levels out. That is, if \mathrm{Ker}\left((T-\lambda1_V)^{i-1}\right)=\mathrm{Ker}\left((T-\lambda1_V)^i\right) then \mathrm{Ker}\left((T-\lambda1_V)^i\right)=\mathrm{Ker}\left((T-\lambda1_V)^{i+1}\right). Then, of course, we can use an inductive argument to see that all the kernels from that point on are the same. But why does this happen? Well, let’s say that (T-\lambda1_V)^iv=0 implies that (T-\lambda1_V)^{i-1}v=0 (the other implication we’ve already taken care of above). Then we can see that (T-\lambda1_V)^{i+1}v=0 implies that (T-\lambda1_V)^iv=0 by rewriting them:


where we have used the assumed implication between the second and third lines.

So once this sequence stops growing at one step, it never grows again. That is, if the kernels ever stabilize the sequence looks like


and \mathrm{Ker}\left((T-\lambda1_V)^n\right) is as large as it ever gets.

So does the sequence top out? Of course, it has to! Indeed, each step before it stops growing raises the dimension by at least one, so if it didn’t stop by step d=\dim(V) it would get bigger than the space V itself, which is absurd because these are all subspaces of V. So \mathrm{Ker}\left((T-\lambda1_V)^d\right) is the largest of these kernels.

Where does this leave us? We’ve established that if v is in the kernel of any power of T-\lambda1_V it will be in \mathrm{Ker}\left((T-\lambda1_V)^d\right). Thus the space of generalized eigenvectors with eigenvalue \lambda is exactly this kernel. Now finding generalized eigenvectors is just as easy as finding eigenvectors.

February 17, 2009 Posted by | Algebra, Linear Algebra | 5 Comments

Generalized Eigenvectors

Sorry for the delay, but exam time is upon us, or at least my college algebra class.

Anyhow, we’ve established that distinct eigenvalues allow us to diagonalize a matrix, but repeated eigenvalues cause us problems. We need to generalize the concept of eigenvectors somewhat.

First of all, since an eigenspace generalizes a kernel, let’s consider a situation where we repeat the eigenvalue {0}:


This kills off the vector \begin{pmatrix}1\\{0}\end{pmatrix} right away. But the vector \begin{pmatrix}{0}\\1\end{pmatrix} gets sent to \begin{pmatrix}1\\{0}\end{pmatrix}, where it can be killed by a second application of the matrix. So while there may not be two independent eigenvectors with eigenvalue {0}, there can be another vector that is eventually killed off by repeated applications of the matrix.

More generally, consider a strictly upper-triangular matrix, all of whose diagonal entries are zero as well:


That is, t_i^j=0 for all i\geq j. What happens as we compose this matrix with itself? I say that for T^2 we’ll find the (i,k) entry to be zero for all i\geq {k}+1. Indeed, we can calculate it as a sum of terms like t_i^jtj^k. For each of these factors to be nonzero we need i\leq j-1 and j\leq k-1. That is, i\leq k-2, or else the matrix entry must be zero. Similarly, every additional factor of T pushes the nonzero matrix entries one step further from the diagonal, and eventually they must fall off the upper-right corner. That is, some power of T must give the zero matrix. The vectors may not have been killed by the transformation T, so they may not all have been in the kernel, but they will all be in the kernel of some power of T.

Similarly, let’s take a linear transformation T and a vector v. If v\in\mathrm{Ker}(T-\lambda1_V) we said that v is an eigenvector of T with eigenvalue \lambda. Now we’ll extend this by saying that if v\in\mathrm{Ker}(T-\lambda1_V)^n for some n, then v is a generalized eigenvector of T with eigenvalue \lambda.

February 16, 2009 Posted by | Algebra, Linear Algebra | 3 Comments

Repeated Eigenvalues

So we’ve got a linear transformation T on a vector space V of finite dimension d. We take its characteristic polynomial to find the eigenvalues. If all of its roots are distinct (and there are d of them, as there must be if we’re working over an algebraically closed field) then we can pick a basis of eigenvectors and get a diagonal matrix for T.

But what if a root is repeated? Say the characteristic polynomial has a factor of (\lambda-1)^2. We might think to pick two linearly independent eigenvectors with eigenvalue {1}, but unfortunately that doesn’t always work. Consider the transformation given by the matrix


This matrix is upper-triangular, and so we can just read off its eigenvalues from the diagonal: two copies of the eigenvalue {1}. We can easily calculate its action on a vector


So the eigenvector equation says that x=x+y and y=y. The second is trivial, and the first tells us that y=0. But we can only pick one linearly independent vector of this form. So we can’t find a basis of eigenvectors, and we can’t diagonalize this matrix. We’re going to need another approach.

February 11, 2009 Posted by | Algebra, Linear Algebra | 2 Comments

Transformations with All Eigenvalues Distinct

When will a linear transformation T have a diagonal matrix? We know that this happens exactly when we can find a basis \{e_i\}_{i=1}^d of our vector space V so that each basis vector is an eigenvector T(e_i)=\lambda_ie_i. But when does this happen? The full answer can be complicated, but one sufficient condition we can describe right away.

First, remember that we can always calculate the characteristic polynomial of T. This will be a polynomial of degree d — the dimension of V. Further, we know that a field element \lambda is an eigenvalue of T if and only if \lambda is a root of the characteristic polynomial. Since the degree of the polynomial is d, we can expect to find no more than d distinct roots of the polynomial — no more than d distinct eigenvalues of T. I want to consider what happens in this generic case.

Now we have d field elements \lambda_i, and for each one we can pick a vector e_i so that T(e_i)=\lambda_ie_i. I say that they form a basis of V. If we can show that they are linearly independent, then they span some subspace of V. But since there are d distinct vectors here, they span a subspace of dimension d, which must be all of V. And we know, by an earlier lemma that a collection of eigenvectors corresponding to distinct eigenvalues must be linearly independent! Thus, the e_i form a basis for V consisting of eigenvectors of T. With respect to this basis, the matrix of T is diagonal.

This is good, but notice that there are still plenty of things that can go wrong. It’s entirely possible that two (or more!) of the eigenvalues are not distinct. Worse, we could be working over a field that isn’t algebraically closed, so there may not be d roots at all, even counting duplicates. But still, in the generic case we’ve got a diagonal matrix with respect to a well-chosen basis.

February 10, 2009 Posted by | Algebra, Linear Algebra | 2 Comments

Diagonal Matrices

Even better than upper-triangular matrices are diagonal matrices. These ones look like


Each basis vector is an eigenvector, with the eigenvalues listed down the diagonal. It’s straightforward to show that the sum and product of two diagonal matrices are themselves diagonal. Thus, diagonal matrices form a further subalgebra inside the algebra of upper-triangular matrices. This algebra is just a direct sum of copies of \mathbb{F}, with multiplication defined component-by-component.

Diagonal matrices are especially nice because it’s really easy to see how they act on vectors. Given a diagonal matrix D, break a vector v into its components v=\sum v^ie_i. Multiply each component by the corresponding eigenvalue D(v)=\sum(\lambda_iv^i)e_i. And you’re done! Composing a diagonal matrix with another matrix is also easy. To find DT, just multiply each row of T by the corresponding eigenvalue. To find TD, multiply each column of T by the corresponding diagonal.

So, if we can find a basis for our vector space consisting only of eigenvectors for the transformation T, then with respect to that basis the matrix of T is diagonal. This is as good as we can hope for, and a lot of linear algebra comes down to determining when we can do this.

February 9, 2009 Posted by | Algebra, Linear Algebra | 4 Comments