The Unapologetic Mathematician

Mathematics for the interested outsider

Decomposing Real Linear Transformations

Finally, we come to the analogue of Jordan normal form over the real numbers.

Given a linear transformation T:V\rightarrow V on a real vector space V of dimension d, we can find its characteristic polynomial. We can factor a real polynomial into the product of linear terms (X-\lambda_i) and irreducible quadratic terms (X^2-\tau_jX+\delta_j) with \tau_j^2<4\delta. These give us a list of eigenvalues and eigenpairs for T.

For each distinct eigenvalue \lambda_i we get a subspace U_i=\mathrm{Ker}\left((T-\lambda_iI_V)^d\right)\subseteq V of generalized eigenvectors, with m distinct eigenvalues in total. Similarly, for each distinct eigenpair (\tau_j,\delta_j) we get a subspace V_j=\mathrm{Ker}\left((T^2-\tau_jT+\delta_jI_V)^d\right)\subseteq V of generalized eigenvectors, with n distinct eigenpairs in total.

We know that these subspaces are mutually disjoint. We also know that the dimension of U_i is equal to the multiplicity of \lambda_i, which is the number of factors of (X-\lambda_i) in the characteristic polynomial. Similarly, the dimension of V_j is twice the multiplicity of (\tau_j,\delta_j), which is the number of factors of (X^2-\tau_jX+\delta) in the characteristic polynomial. Since each linear factor contributes {1} to the degree of the polynomial, while each irreducible quadratic contributes {2}, we can see that the sum of the dimensions of the U_i and V_j is equal to the degree of the characteristic polynomial, which is the dimension of V itself.

That is, we have a decomposition of V as a direct sum of invariant subspaces

\displaystyle V=\bigoplus\limits_{i=1}^mU_i\oplus\bigoplus\limits_{j=1}^nV_j

Further, we know that the restrictions (T-\lambda_iI_V)|_{U_i} and (T^2-\tau_jT+\delta_jI_V)|_{V_j} are nilpotent transformations.

I’ll leave it to you to work out what this last property implies for the matrix on the generalized eigenspace of an eigenpair, in analogy with a Jordan block for an eigenvalue.


April 9, 2009 Posted by | Algebra, Linear Algebra | 3 Comments

The Multiplicity of an Eigenpair

As usual, let T:V\rightarrow V be a linear transformation on a real vector space V of dimension d. We know that T can be put into an almost upper-triangular form


where each block A_j is either a 1\times1 matrix or a 2\times2 matrix with no eigenvalues. Now I assert that

  • If \lambda is a real number, then exactly \dim\left(\mathrm{Ker}\left((T-\lambda I_V)^d\right)\right) of the blocks are the 1\times1 matrix whose single entry is \lambda.
  • If (\tau,\delta) is a pair of real numbers with \tau^2<4\delta, then exactly

    \displaystyle\frac{\dim\left(\mathrm{Ker}\left((T^2-\tau T+\delta)^d\right)\right)}{2}

    of the blocks are 2\times2 matrices with characteristic polynomial X^2-\tau X+\delta. Incidentally, this shows that the dimension of this generalized eigenspace is even.

This statement is parallel to the one about multiplicities of eigenvalues over algebraically closed fields. And we’ll use a similar proof. First, let’s define the polynomial p(X) to be X-\lambda if we’re trying to prove the first part, and to be X^2-\tau X+\delta if we’re trying to prove the second part, and let n be the degree of p. We’ll proceed by induction on the number m of blocks along the diagonal.

If m=1 then V is either one- or two-dimensional. Then the statement reduces to what we worked out by cases earlier. So from here we’ll assume that m>1, and that the statement holds for all matrices with m-1 blocks.

Let’s define U\subseteq V to be the subspace spanned by the first m-1 blocks, and \hat{U} be the subspace spanned by the last block. That is, U consists of all but the last one or two rows and columns of the matrix, depending on whether A_m is 1\times1 or 2\times2. Clearly U is invariant under T, and the restriction T|_U of T to U has matrix


with m-1 blocks. Thus the inductive hypothesis tells us that exactly


of the blocks from A_1 to A_{m-1} have characteristic polynomial p.

We’ll also define \hat{T}:\hat{U}\rightarrow\hat{U} to be the linear transformation acting by the block A_m. This is essentially the action of T on the quotient space V/U, but we’re viewing \hat{U} as giving representatives in V for vectors in the quotient space. This way, if v\in\hat{U} is a vector in this subspace of representatives we can write Tv=Sv+u_1 for some u_1\in U. Further, T^2v=S^2v+u_2 for some other vector u_2\in U. No matter which form of p we’re using, we can see that p(T)v=p(S)v+u_3 for some u_3\in U, and further that p(T)^dv=p(S)^dv+u_4 for some u_4\in U.

Now, either A_m has characteristic polynomial p or not. If not, then I say that \mathrm{Ker}\left(p(T)^d\right)\subseteq U. This implies that


and thus that both the dimension of this kernel and the number of blocks with characteristic polynomial p are the same as for T|_U.

So let’s assume v\in\mathrm{Ker}\left(p(T)^d\right) and write v=u+\hat{u} with u\in U and \hat{u}\in\hat{U}. Then


for some u_5\in U. This implies that p(S)^d\hat{u}=0, but since p is not the characteristic polynomial of S, it is invertible on \hat{U}. Thus \hat{u}=0 and v\in\mathrm{Ker}\left(p\left(T|_U\right)^d\right).

On the other hand, if the characteristic polynomial of A_m is p, then we want to show that


The inclusion-exclusion principle tells us that


We’ll show that U+\mathrm{Ker}\left(p(T)^d\right)=V, and so its dimension is d, and we have the result we want.

So take \hat{u}\in\hat{U}. Because the characteristic polynomial of S is p, we know that p(S)=0. Thus p(T)\hat{u}\in U. Then

\displaystyle p(T)^d\hat{u}=p(T)^d\left(p(T)\hat{u}\right)\in\mathrm{Im}\left(p\left(T|_U\right)^{d-1}\right)=\mathrm{Im}\left(p\left(T|_U\right)^d\right)

where the last equality holds because the dimension of U is d-n, and so the image has stabilized by this point. Thus we can choose u\in U so that p(T)^d\hat{u}=p\left(T|_U\right)^du. And so


which shows that \hat{u}-u\in\mathrm{Ker}\left(p(T)^d\right). And thus \hat{u}=u+(\hat{u}-u) is in U+\mathrm{Ker}\left(p(T)^d\right). Since \hat{u} was arbitrary, the whole subspace \hat{U}\subseteq U+\mathrm{Ker}\left(p(T)^d\right), which shows that V=U+\hat{U}\subseteq U+\mathrm{Ker}\left(p(T)^d\right)\subseteq V, which completes our proof.

Now, all of that handled we turn to calculate the characteristic polynomial of T, only to find that it’s the product of the characteristic polynomials of all the blocks A_m. That is, we will have \dim\left(\mathrm{Ker}\left((T-\lambda I_V)^d\right)\right) factors of (X-\lambda) and \frac{1}{2}\dim\left(\mathrm{Ker}\left((T^2-\tau T+\delta)^d\right)\right) factors of (X^2-\tau X+\delta). We can thus define this half-dimension to be the multiplicity of the eigenpair. Like the multiplicity of an eigenvalue, it counts both the number of times the corresponding factor shows up in the characteristic polynomial of T, and the number of blocks on the diagonal of an almost upper-triangular matrix for T that have this characteristic polynomial.

April 8, 2009 Posted by | Algebra, Linear Algebra | Leave a comment

Generalized Eigenspaces are Still Invariant and Disjoint

When working over an algebraically closed field we found that generalized eigenspaces are invariant and disjoint from each other. The same holds now that we’re allowing eigenpairs for transformations on real vector spaces.

First off, the generalized eigenspace of an eigenpair (\tau,\delta) is the kernel \mathrm{Ker}\left((T^2-\tau T+\delta I_V)^d\right) of a polynomial in T. Just like before, this kernel is automatically invariant under T, just like the generalized eigenspace \mathrm{Ker}\left((T-\lambda I_V)^d\right) is.

Generalized eigenspaces of distinct eigenvalues are disjoint, as before. But let \lambda be an eigenvalue, (\tau,\delta) be an eigenpair, and v be a vector in both generalized eigenspaces. The invariance of \mathrm{Ker}\left((T^2-\tau T+\delta I_V)^d\right) under T shows that if u is a generalized eigenvector of (\tau,\delta), then so is Tu-\lambda u. Just like we did before, we can keep hitting v with T-\lambda I_V until one step before it vanishes (which it eventually must, since it’s a generalized eigenvector of \lambda). So without loss of generality we can assume that v is an actual eigenvector of \lambda and a generalized eigenvector of (\tau,\delta).

Now we can use the generalized eigenvector property to write

\displaystyle(T^2-\tau T+\delta I_V)^dv=0

but since v is an eigenvector with eigenvalue \lambda, this says


If v is nonzero, this can only be true if \lambda is a root of X^2-\tau X+\delta, which we assumed not to be the case.

Finally we consider two distinct eigenpairs (\tau_1,\delta_1) and (\tau_2,\delta_2), and a generalized eigenvector of both, v. Another argument like that above shows that without loss of generality we can assume v is an actual eigenvector of (\tau_2,\delta_2). This eigenspace is the kernel \mathrm{Ker}(T^2-\tau_2T+\delta_2I_V), which is thus invariant, and another argument like before lets us assume that v is an actual eigenvector of both eigenpairs. Thus we have


Subtracting, we find

\displaystyle -(\tau_1-\tau_2)Tv+(\delta_1-\delta_2)v=0

If \tau_1-\tau_2\neq0, this makes v an eigenvector with eigenvalue \frac{\delta_1-\delta_2}{\tau_1-\tau_2}. On the other hand, if \tau_1=\tau_2 then \delta_1\neq\delta_2, and we conclude that v=0.

At the end of the day, no nonzero vector can be a generalized eigenvector of more than one eigenvalue or eigenpair.

April 7, 2009 Posted by | Algebra, Linear Algebra | 1 Comment

Generalized Eigenvectors of an Eigenpair

Just as we saw when dealing with eigenvalues, eigenvectors alone won’t cut it. We want to consider the kernel not just of one transformation, but of its powers. Specifically, we will say that v is a generalized eigenvector of the eigenpair (\tau,\delta) if for some power n we have

\displaystyle(T^2-\tau T+\delta I_V)^nv=0

The same argument as before tells us that the kernel will stabilize by the time we take d=\dim(V) powers of an operator, so we define the generalized eigenspace of an eigenpair (\tau,\delta) to be

\displaystyle\mathrm{Ker}\left((T^2-\tau T+\delta I_V)^d\right)

Let’s look at these subspaces a little more closely, along with the older ones of the form \mathrm{Ker}\left((T-\lambda I_V)^d\right), just to make sure they’re as well-behaved as our earlier generalized eigenspaces are. First, let V be one-dimensional, so T must be multiplication by \lambda_0. Then the kernel of T-\lambda I_V is all of V if \lambda=\lambda_0, and is trivial otherwise. On the other hand, what happens with an eigenpair (\tau,\delta)? Well, one application of the operator gives

\displaystyle(T^2-\tau T+\delta I_V)v=(\lambda_0^2-\tau\lambda_0+\delta)v

for any nonzero v. But this will always be itself nonzero, since we’re assuming that the polynomial X^2-\tau X+\delta has no roots. Thus the generalized eigenspace of (\tau,\delta) will be trivial.

Next, if V is two-dimensional, either T has an eigenvalue or it doesn’t. If it does, then this gives a one-dimensional invariant subspace. The argument above shows that the generalized eigenspace of any eigenpair (\tau,\delta) is again trivial. But if T has no eigenvalues, then the generalized eigenspace of any eigenvalue \lambda is trivial. On the other hand we’ve seen that the kernel of T^2-\tau T+\delta I_V is either the whole of V or nothing, and the former case happens exactly when \tau is the trace of T and \delta is its determinant.

Now if V is a real vector space of any finite dimension d we know we can find an almost upper-triangular form. This form is highly non-unique, but there are some patterns we can exploit as we move forward.

April 6, 2009 Posted by | Algebra, Linear Algebra | Leave a comment

Eigenvectors of an Eigenpair

An eigenvalue \lambda of a linear transformation T is the same thing as a root of the characteristic polynomial of T. That is, the characteristic polynomial has a factor (X-\lambda). We can evaluate this polynomial at T to get the linear transformation (T-\lambda I_V. Vectors in the kernel of this space are the eigenvalues corresponding to the eigenvector \lambda.

Now we want to do the same thing with an eigenpair (\tau,\delta). This corresponds to an irreducible quadratic factor (X^2-\tau X+\delta) in the characteristic polynomial of T. Evaluating this polynomial at T we get the transformation T^2-\tau T+\delta I_V, which I assert has a nontrivial kernel. Specifically, I want to focus in on some two-dimensional invariant subspace on which T has no eigenvalues. This corresponds to a 2\times2 block in an almost upper-triangular representation of T. So we’ll just assume for the moment that V has dimension 2.

What I assert is this: if p is a monic polynomial (with leading coefficient {1}) of degree two, then either p is the characteristic polynomial of T or it’s not. If it is, then p(T)=0, and its kernel is the whole of V. If not, then the kernel is trivial, and p(T) is invertible.

In the first case we can just pick a basis, find a matrix, and crank out the calculation. If the matrix of T is


then the characteristic polynomial is X^2-(a+d)X+(ad-bc). We substitute the matrix into this polynomial to find


On the other hand, if q is the characteristic polynomial and p is any other monic polynomial of degree two, then q(T)=0, as we just showed. Then we can calculate

\displaystyle p(T)=p(T)-q(T)=aT+bI_V

for some constants a and b, at least one of which must be nonzero. If a=0, then p(T) is a nonzero multiple of the identity, which is invertible as claimed. On the other hand, if a\neq0, then

\displaystyle p(T)=a(T-\frac{b}{a}I_V)

which must be invertible since we assumed that T has no eigenvalues.

So for any 2\times2 block, the action of an irreducible quadratic polynomial in T either kills off the whole block or has a trivial kernel. This makes it reasonable to define an eigenvector of the eigenpair (\tau,\delta) to be a vector in the kernel \mathrm{Ker}\left(T^2-\tau T+\delta I_V\right), in analogy with the definition of an eigenvector of a given eigenvalue.

April 3, 2009 Posted by | Algebra, Linear Algebra | 2 Comments

The Characteristic Polynomial of a Real Linear Transformation

We continue working over the field \mathbb{R} of real numbers. Again, let T:V\rightarrow V be a linear transformation from a real vector space V of dimension d to itself. We want to find the characteristic polynomial of this linear transformation.

When we had an algebraically closed field, this was easy. We took an upper-triangular matrix, and then the determinant was just the product down the diagonal. This gave one factor of the form (\lambda-\lambda_j) for each diagonal entry \lambda_j, which established that the diagonal entries of an upper-triangular matrix were exactly the eigenvalues of the linear transformation.

Now we don’t always have an upper-triangular matrix, but we can always find a matrix that’s almost upper-triangular. That is, one that looks like


where the blocks A_j are all either 1\times1 matrices

\displaystyle A_j=\begin{pmatrix}\lambda_j\end{pmatrix}

or 2\times2 matrices

\displaystyle A_j=\begin{pmatrix}a_j&b_j\\c_j&d_j\end{pmatrix}

In this latter case, we define \tau_j to be the trace a_j+d_j, and \delta_j to be the determinant a_jd_j-b_jc_j. We must find that \tau_j^2<4\delta_j, otherwise we can find another basis which breaks A_j up into two 1\times1 blocks. Let’s go a step further and insist that all the 2\times2 blocks show up first, followed by all the 1\times1 blocks.

Now we can start calculating the determinant of \lambda I_V-T, summing over permutations. Just like we saw with an upper-triangular matrix, if we have a 1\times1 block in the lower-right we have to choose the rightmost entry in the bottom column, or the whole term will be zero. So we start racking up factors (\lambda-\lambda_j) just like before. Each 1\times1 block, then, gives us a root of the characteristic polynomial, which is an eigenvalue. So far everything is the same as in the upper-triangular case.

Once we get to the 2\times2 blocks we have to be a bit more careful. We have two choices of a nonzero entry in the lowest row: \lambda-d_j or -c_j. But if we choose \lambda-d_j then we can only choose \lambda-a_j on the next row up to have a chance of a nonzero term. On the other hand, if we choose -c_j on the lowest row we are forced to choose -b_j next. The choice between these two is independent of any other choices we might make in calculating the determinant. The first always gives a factor of (\lambda-a_j)(\lambda-d_j) to the term corresponding to that permutation, while the second always gives a factor of (-b_j)(-c_j) to its term. These permutations (no matter what other choices we might make) differ by exactly one swap, and so they enter the determinant with opposite signs.

Now we can collect together all the permutations where we make one choice in block A_j, and all the permutations where we make the other choice. From the first collection we can factor out (\lambda-a_j)(\lambda-d_j), and from the second we can factor out (-b_j)(-c_j). What remains after we pull these factors out is the same in either case, so the upshot is that the 2\times2 block A_j contributes a factor of (\lambda-a_j)(\lambda-d_j)-(-b_j)(-c_j) to the determinant. Some calculation simplifies this:


which is a quadratic factor with no real roots (since we assumed that \tau_j^2<4\delta_j).

But a factor of the characteristic polynomial of this formula is exactly what we defined to be an eigenpair. That is, just as eigenvectors — roots of the characteristic polynomial — correspond to one-dimensional invariant subspaces, so too do eigenpairs — irreducible quadratic factors of the characteristic polynomial — correspond to two-dimensional invariant subspaces. The 2\times2 blocks that show up along the diagonal of the almost upper-triangular matrix give rise to the eigenpairs of T.

April 2, 2009 Posted by | Algebra, Linear Algebra | 4 Comments

Almost Upper-Triangular Matrices

Over an algebraically closed field we can always find an upper-triangular matrix for any linear endomorphism. Over the real numbers we’re not quite so lucky, but we can come close.

Let T:V\rightarrow V be a linear transformation from a real vector space V of dimension d to itself. We might not be able to find an eigenvector — a one-dimensional invariant subspace — but we know that we can find either a one-dimensional or a two-dimensional invariant subspace U_1\subseteq V. Just like before we get an action of T on the quotient space V/U_1. Why? Because if we have two representatives v and w of the same vector in the quotient space, then we can write w=v+u. Acting by T, we find Tw=Tv+Tu. And since Tu\in U_1, the vectors Tv and Tw are again equivalent in the quotient space.

Now we can find a subspace \hat{U}_2\subseteq V/U_1 which is invariant under this action of T. Is this an invariant subspace of V? No, it’s not even a subspace of V. But we could pick some U_2\subseteq V containing a unique representative for each vector in \hat{U}_2. For instance, we could pick a basis of \hat{U}_2, a representative for each basis vector, and let U_2 be the span of these representatives. Is this an invariant subspace? Still, the answer is no. Let’s say u\in U_2 is the identified representative of \hat{u}\in\hat{U}_2. Then all we know is that Tu is a representative of T\hat{u}, not that it’s the identified representative. It could have some components spilling out into U_1.

As we proceed, picking up either a one- or two-dimensional subspace at each step, we can pick a basis of each subspace. The action of T sends each basis vector into the current subspace and possibly earlier subspaces. Writing it all out, we get a matrix that looks like


where each A_j is either a 1\times1 matrix or a 2\times2 matrix with no eigenvalues. The 1\times1 blocks come from the one-dimensional invariant subspaces in the construction, while the 2\times2 blocks come from the two-dimensional invariant subspaces in the construction, though they may not be invariant once we put them back into V. Above the diagonal we have no control (yet) over the entries, but below the diagonal almost all the entries are zero. The only exceptions are in the 2\times2 blocks, where we poke just barely down by one row.

We can note here that if there are n\leq m two-dimensional blocks and m-n one-dimensional blocks, then the total number of columns will be 2n+(m-n)=n+m=d. Thus we must have at least \lceil\frac{d}{2}\rceil blocks, and at most d blocks. The latter extreme corresponds to an actual upper-triangular matrix.

April 1, 2009 Posted by | Algebra, Linear Algebra | 8 Comments