We’re going to need another way of identifying nilpotent endomorphisms. Let be two subspaces of endomorphisms on a finite-dimensional space , and let be the collection of such that sends into . If satisfies for all then is nilpotent.
The first thing we do is take the Jordan-Chevalley decomposition of — — and fix a basis that diagonalizes with eigenvalues . We define to be the -subspace of spanned by the eigenvalues. If we can prove that this space is trivial, then all the eigenvalues of must be zero, and thus itself must be zero.
We proceed by showing that any linear functional must be zero. Taking one, we define to be the endomorphism whose matrix with respect to our fixed basis is diagonal: . If is the corresponding basis of we can calculate that
Now we can find some polynomial such that ; there is no ambiguity here since if then the linearity of implies that
Further, picking we can see that , so has no constant term. It should be apparent that .
Now, we know that is the semisimple part of , so the Jordan-Chevalley decomposition lets us write it as a polynomial in with no constant term. But then we can write . Since maps into , so does , and our hypothesis tells us that
Hitting this with we find that the sum of the squares of the is also zero, but since these are rational numbers they must all be zero.
Thus, as we asserted, the only possible -linear functional on is zero, meaning that is trivial, all the eigenvalues of are zero, and is nipotent, as asserted.
Now that we’ve given the proof, we want to mention a few uses of the Jordan-Chevalley decomposition.
First, we let be any finite-dimensional -algebra — associative, Lie, whatever — and remember that contains the Lie algebra of derivations . I say that if then so are its semisimple part and its nilpotent part ; it’s enough to show that is.
Just like we decomposed in the proof of the Jordan-Chevalley decomposition, we can break down into the eigenspaces of — or, equivalently, of . But this time we will index them by the eigenvalue: consists of those such that for sufficiently large .
Now we have the identity:
which is easily verified. If a sufficiently large power of applied to and a sufficiently large power of applied to are both zero, then for sufficiently large one or the other factor in each term will be zero, and so the entire sum is zero. Thus we verify that .
If we take and then , and thus . On the other hand,
And thus satisfies the derivation property
so and are both in .
For the other side we note that, just as the adjoint of a nilpotent endomorphism is nilpotent, the adjoint of a semisimple endomorphism is semisimple. Indeed, if is a basis of such that the matrix of is diagonal with eigenvalues , then we let be the standard basis element of , which is isomorphic to using the basis . It’s a straightforward calculation to verify that
and thus is diagonal with respect to this basis.
So now if is the Jordan-Chevalley decomposition of , then is semisimple and is nilpotent. They commute, since
Since is the decomposition of into a semisimple and a nilpotent part which commute with each other, it is the Jordan-Chevalley decomposition of .
We set so that is the direct sum of these subspaces, each of which is fixed by .
On the subspace , has the characteristic polynomial . What we want is a single polynomial such that
That is, has no constant term, and for each there is some such that
Thus, if we evaluate on the block we get .
To do this, we will make use of a result that usually comes up in number theory called the Chinese remainder theorem. Unfortunately, I didn’t have the foresight to cover number theory before Lie algebras, so I’ll just give the statement: any system of congruences — like the one above — where the moduli are relatively prime — as they are above, unless is an eigenvalue in which case just leave out the last congruence since we don’t need it — has a common solution, which is unique modulo the product of the separate moduli. For example, the system
has the solution , which is unique modulo . This is pretty straightforward to understand for integers, but it works as stated over any principal ideal domain — like — and, suitably generalized, over any commutative ring.
So anyway, such a exists, and it’s the we need to get the semisimple part of . Indeed, on any block differs from by stripping off any off-diagonal elements. Then we can just set and find . Any two polynomials in must commute — indeed we can simply calculate
Finally, if then so must any polynomial in , so the last assertion of the decomposition holds.
The only thing left is the uniqueness of the decomposition. Let’s say that is a different decomposition into a semisimple and a nilpotent part which commute with each other. Then we have , and all four of these endomorphisms commute with each other. But the left-hand side is semisimple — diagonalizable — but the right hand side is nilpotent, which means its only possible eigenvalue is zero. Thus and .
We recall that any linear endomorphism of a finite-dimensional vector space over an algebraically closed field can be put into Jordan normal form: we can find a basis such that its matrix is the sum of blocks that look like
where is some eigenvalue of the transformation. We want a slightly more abstract version of this, and it hinges on the idea that matrices in Jordan normal form have an obvious diagonal part, and a bunch of entries just above the diagonal. This off-diagonal part is all in the upper-triangle, so it is nilpotent; the diagonalizable part we call “semisimple”. And what makes this particular decomposition special is that the two parts commute. Indeed, the block-diagonal form means we can carry out the multiplication block-by-block, and in each block one factor is a constant multiple of the identity, which clearly commutes with everything.
More generally, we will have the Jordan-Chevalley decomposition of an endomorphism: any can be written uniquely as the sum , where is semisimple — diagonalizable — and is nilpotent, and where and commute with each other.
Further, we will find that there are polynomials and — each of which with no constant term — such that and . And thus we will find that any endomorphism that commutes with with also commute with both and .
Finally, if is any pair of subspaces such that then the same is true of both and .
We will prove these next time, but let’s see that this is actually true of the Jordan normal form. The first part we’ve covered.
For the second, set aside the assertion about and ; any endomorphism commuting with either multiplies each block by a constant or shuffles similar blocks, and both of these operations commute with both and .
For the last part, we may as well assume that , since otherwise we can just restrict to . If then the Jordan normal form shows us that any complementary subspace to must be spanned by blocks with eigenvalue . In particular, it can only touch the last row of any such block. But none of these rows are in the range of either the diagonal or off-diagonal portions of the matrix.
A very useful structure to have on a complex vector space carrying a representation of a group is an “invariant form”. To start with, this is a complex inner product , which we recall means that it is
- linear in the second slot —
- conjugate symmetric —
- positive definite — for all
Again as usual these imply conjugate linearity in the first slot, so the form isn’t quite bilinear. Still, people are often sloppy and say “invariant bilinear form”.
Anyhow, now we add a new condition to the form. We demand that it be
- invariant under the action of —
Here I have started to write as shorthand for . We will only do this when the representation in question is clear from the context.
The inner product gives us a notion of length and angle. Invariance now tells us that these notions are unaffected by the action of . That is, the vectors and have the same length for all and . Similarly, the angle between vectors and is exactly the same as the angle between and . Another way to say this is that if the form is invariant for the representation , then the image of is actually contained in the
orthogonal group [commenter Eric Finster, below, reminds me that since we've got a complex inner product we're using the group of unitary transformations with respect to the inner product : ].
More important than any particular invariant form is this: if we have an invariant form on our space , then any reducible representation is decomposable. That is, if is a submodule, we can find another submodule so that as -modules.
If we just consider them as vector spaces, we already know this: the orthogonal complement is exactly the subspace we need, for . I say that if is a -invariant subspace of , then is as well, and so they are both submodules. Indeed, if , then we check that is as well:
where the first equality follows from the -invariance of our form; the second from the representation property; and the third from the fact that is an invariant subspace, so .
So in the presence of an invariant form, all finite-dimensional representations are “completely reducible”. That is, they can be decomposed as the direct sum of a number of irreducible submodules. If the representation is irreducible to begin with, we’re done. If not, it must have some submodule . Then the orthogonal complement is also a submodule, and we can write . Then we can treat both and the same way. The process must eventually bottom out, since each of and have dimension smaller than that of , which was finite to begin with. Each step brings the dimension down further and further, and it must stop by the time it reaches .
This tells us, for instance, that there can be no inner product on that is invariant under the representation of the group of integers we laid out at the end of last time. Indeed, that was an example of a reducible representation that is not decomposable, but if there were an invariant form it would have to decompose.
Before we move on, we want to define some structures that blend algebraic and topological notions. These are all based on vector spaces. And, particularly, we care about infinite-dimensional vector spaces. Finite-dimensional vector spaces are actually pretty simple, topologically. For pretty much all purposes you have a topology on your base field , and the vector space (which is isomorphic to for some ) will get the product topology.
But for infinite-dimensional spaces the product topology is often not going to be particularly useful. For example, the space of functions is a product; we write to mean the product of one copy of for each point in . Limits in this topology are “pointwise” limits of functions, but this isn’t always the most useful way to think about limits of functions. The sequence
converges pointwise to a function for and . But we will find it useful to be able to ignore this behavior at the one isolated point and say that . It’s this connection with spaces of functions that brings such infinite-dimensional topological vector spaces into the realm of “functional analysis”.
Okay, so to get a topological vector space, we take a vector space and put a (surprise!) topology on it. But not just any topology will do: Remember that every point in a vector space looks pretty much like every other one. The transformation has an inverse , and it only makes sense that these be homeomorphisms. And to capture this, we put a uniform structure on our space. That is, we specify what the neighborhoods are of , and just translate them around to all the other points.
Now, a common way to come up with such a uniform structure is to define a norm on our vector space. That is, to define a function satisfying the three axioms
- For all vectors and scalars , we have .
- For all vectors and , we have .
- The norm is zero if and only if the vector is the zero vector.
Notice that we need to be working over a field in which we have a notion of absolute value, so we can measure the size of scalars. We might also want to do away with the last condition and use a “seminorm”. In any event, it’s important to note that though our earlier examples of norms all came from inner products we do not need an inner product to have a norm. In fact, there exist norms that come from no inner product at all.
So if we define a norm we get a “normed vector space”. This is a metric space, with a metric function defined by . This is nice because metric spaces are first-countable, and thus sequential. That is, we can define the topology of a (semi-)normed vector space by defining exactly what it means for a sequence of vectors to converge, and in particular what it means for them to converge to zero.
Finally, if we’ve got a normed vector space, it’s a natural question to ask whether or not this vector space is complete or not. That is, we have all the pieces in place to define Cauchy sequences in our vector space, and we would like for all of these sequences to converge under our uniform structure. If this happens — if we have a complete normed vector space — we call our structure a “Banach space”. Most of the spaces we’re concerned with in functional analysis are Banach spaces.
Again, for finite-dimensional vector spaces (at least over or ) this is all pretty easy; we can always define an inner product, and this gives us a norm. If our underlying topological field is complete, then the vector space will be as well. Even without considering a norm, convergence of sequences is just given component-by-component. But infinite-dimensional vector spaces get hairier. Since our algebraic operations only give us finite sums, we have to take some sorts of limits to even talk about most vectors in the space in the first place, and taking limits of such vectors could just complicate things further. Studying these interesting topologies and seeing how linear algebra — the study of vector spaces and linear transformations — behaves in the infinite-dimensional context is the taproot of functional analysis.
Here’s a fact we’ll find useful soon enough as we talk about reflections. Hopefully it will also help get back into thinking about linear transformations and inner product spaces. However, if the linear algebra gets a little hairy (or if you’re just joining us) you can just take this fact as given. Remember that we’re looking at a real vector space equipped with an inner product .
Now, let’s say is some finite collection of vectors which span (it doesn’t matter if they’re linearly independent or not). Let be a linear transformation which leaves invariant. That is, if we pick any vector then the image will be another vector in . Let’s also assume that there is some -dimensional subspace which leaves completely untouched. That is, for every . Finally, say that there’s some so that (clearly ) and also that is invariant under . Then I say that and .
We’ll proceed by actually considering the transformation , and showing that this is the identity. First off, definitely fixes , since
so acts as the identity on the line . In fact, I assert that also acts as the identity on the quotient space . Indeed, acts trivially on , and every vector in has a unique representative in . And then acts trivially on , and every vector in has a unique representative in .
This does not, however, mean that acts trivially on any given complement of . All we really know at this point is that for every the difference between and is some scalar multiple of . On the other hand, remember how we found upper-triangular matrices before. This time we peeled off one vector and the remaining transformation was the identity on the remaining -dimensional space. This tells us that all of our eigenvalues are , and the characteristic polynomial is , where . We can evaluate this on the transformation to find that
Now let’s try to use the collection of vectors . We assumed that both and send vectors in back to other vectors in , and so the same must be true of . But there are only finitely many vectors (say of them) in to begin with, so must act as some sort of permutation of the vectors in . But every permutation in has an order that divides . That is, applying times must send every vector in back to itself. But since is a spanning set for , this means that , or that
So we have two polynomial relations satisfied by , and will clearly satisfy any linear combination of these relations. But Euclid’s algorithm shows us that we can write the greatest common divisor of these relations as a linear combination, and so must satisfy the greatest common divisor of and . It’s not hard to show that this greatest common divisor is , which means that we must have or .
It’s sort of convoluted, but there are some neat tricks along the way, and we’ll be able to put this result to good use soon.
Before introducing my main question for the next series of posts, I’d like to talk a bit about reflections in a real vector space equipped with an inner product . If you want a specific example you can think of the space consisting of -tuples of real numbers . Remember that we’re writing our indices as superscripts, so we shouldn’t think of these as powers of some number , but as the components of a vector. For the inner product, you can think of the regular “dot product” .
Everybody with me? Good. Now that we’ve got our playing field down, we need to define a reflection. This will be an orthogonal transformation, which is just a fancy way of saying “preserves lengths and angles”. What makes it a reflection is that there’s some -dimensional “hyperplane” that acts like a mirror. Every vector in itself is just left where it is, and a vector on the line that points perpendicularly to will be sent to its negative — “reflecting” through the “mirror” of .
Any nonzero vector spans a line , and the orthogonal complement — all the vectors perpendicular to — forms an -dimensional subspace , which we can use to make just such a reflection. We’ll write for the reflection determined in this way by . We can easily write down a formula for this reflection:
It’s easy to check that if then , while if is perpendicular to — if — then , leaving the vector fixed. Thus this formula does satisfy the definition of a reflection through .
The amount that reflection moves in the above formula will come up a lot in the near future; enough so we’ll want to give it the notation . That is, we define:
Notice that this is only linear in , not in . You might also notice that this is exactly twice the length of the projection of the vector onto the vector . This notation isn’t standard, but the more common notation conflicts with other notational choices we’ve made on this weblog, so I’ve made an executive decision to try it this way.
We’re trying to invert a function which is continuously differentiable on some region . That is we know that if is a point where , then there is a ball around where is one-to-one onto some neighborhood around . Then if is a point in , we’ve got a system of equations
that we want to solve for all the .
We know how to handle this if is defined by a linear transformation, represented by a matrix :
In this case, the Jacobian transformation is just the function itself, and so the Jacobian determinant is nonzero if and only if the matrix is invertible. And so our solution depends on finding the inverse and solving
This is the approach we’d like to generalize. But to do so, we need a more specific method of finding the inverse.
This is where Cramer’s rule comes in, and it starts by analyzing the way we calculate the determinant of a matrix . This formula
involves a sum over all the permutations , and we want to consider the order in which we add up these terms. If we fix an index , we can factor out each matrix entry in the th column:
where the hat indicates that we omit the th term in the product. For a given value of , we can consider the restricted sum
which is times the determinant of the - “minor” of the matrix . That is, if we strike out the row and column of which contain and take the determinant of the remaining matrix, we multiply this by to get . These are the entries in the “adjugate” matrix .
What we’ve shown is that
(no summation on ). It’s not hard to show, however, that if we use a different row from the adjugate matrix we find
That is, the adjugate times the original matrix is the determinant of times the identity matrix. And so if we find
So what does this mean for our system of equations? We can write
But how does this sum differ from the one we used before (without summing on ) to calculate the determinant of ? We’ve replaced the th column of by the column vector , and so this is just another determinant, taken after performing this replacement!
Here’s an example. Let’s say we’ve got a system written in matrix form
The entry in the th row and th column of the adjugate matrix is calculated by striking out the th column and th row of our original matrix, taking the determinant of the remaining matrix, and multiplying by . We get
and thus we find
where we note that
In other words, our solution is given by ratios of determinants:
and similar formulae hold for larger systems of equations.
Sorry for the delay from last Friday to today, but I was chasing down a good lead.
Anyway, last week I said that I’d talk about a linear map that extends the notion of the correspondence between parallelograms in space and perpendicular vectors.
First of all, we should see why there may be such a correspondence. We’ve identified -dimensional parallelepipeds in an -dimensional vector space with antisymmetric tensors of degree : . Of course, not every such tensor will correspond to a parallelepiped (some will be linear combinations that can’t be written as a single wedge of vectors), but we’ll just keep going and let our methods apply to such more general tensors. Anyhow, we also know how to count the dimension of the space of such tensors:
This formula tells us that and will have the exact same dimension, and so it makes sense that there might be an isomorphism between them. And we’re going to look for one which defines the “perpendicular” -dimensional parallelepiped with the same size.
So what do we mean by “perpendicular”? It’s not just in terms of the “angle” defined by the inner product. Indeed, in that sense the parallelograms and are perpendicular. No, we want any vector in the subspace defined by our parallelepiped to be perpendicular to any vector in the subspace defined by the new one. That is, we want the new parallelepiped to span the orthogonal complement to the subspace we start with.
Our definition will also need to take into account the orientation on . Indeed, considering the parallelogram in three-dimensional space, the perpendicular must be for some nonzero constant , or otherwise it won’t be perpendicular to the whole - plane. And has to be in order to get the right size. But will it be or ? The difference is entirely in the orientation.
Okay, so let’s pick an orientation on , which gives us a particular top-degree tensor so that . Now, given some , we define the Hodge dual to be the unique antisymmetric tensor of degree satisfying
for all . Notice here that if and describe parallelepipeds, and any side of is perpendicular to all the sides of , then the projection of onto the subspace spanned by will have zero volume, and thus . This is what we expect, for then this side of must lie within the perpendicular subspace spanned by , and so the wedge should also be zero.
As a particular example, say we have an orthonormal basis of so that . Then given a multi-index the basic wedge gives us the subspace spanned by the vectors . The orthogonal complement is clearly spanned by the remaining basis vectors , and so , with the sign depending on whether the list is an even or an odd permutation of .
To be even more explicit, let’s work these out for the cases of dimensions three and four. First off, we have a basis . We work out all the duals of basic wedges as follows:
This reconstructs the correspondence we had last week between basic parallelograms and perpendicular basis vectors. In the four-dimensional case, the basis leads to the duals
It’s not a difficult exercise to work out the relation for a degree tensor in an -dimensional space.