Inner Products on Exterior Algebras and Determinants
I want to continue yesterday’s post with some more explicit calculations to hopefully give a bit more of the feel.
First up, let’s consider wedges of degree . That is, we pick
vectors
and wedge them all together (in order) to get
. What is its inner product with another of the same form? We calculate
where in the third line we’ve rearranged the factors at the right and used the fact that , and in the fourth line we’ve relabelled
. This looks a lot like the calculation of a determinant. In fact, it is
times the determinant of the matrix with entries
.
If we use the “renormalized” inner product on from the end of yesterday’s post, then we get an extra factor of
, which cancels off the
and gives us exactly the determinant.
We can use the inner product to read off components of exterior algebra elements. If is an element of degree
we write
As an explicit example, we may take to have dimension
and consider an element of degree
in
We call what we’re writing in the superscript to we call a “multi-index”, and sometimes we just write it as
, which in the summation convention runs over all increasing collections of
indices. Correspondingly, we can just write
for the multi-index
.
Alternatively, we could expand the wedges out in terms of tensors:
where we just think of the superscript as a collection of separate indices, all of which run from
to the dimension of
, with the understanding that
, and similarly for higher degrees; swapping two indices switches the sign of the component. All this index juggling gets distracting and confusing, but it’s sometimes necessary for explicit computations, and the physicists love it.
Anyway, we can use this to get back to our original definition of the determinant of a linear transformation . Pick a orthonormal basis
for
and wedge them all together to get an element
of top degree in
. Since the space of top degree is one-dimensional, any linear transformation on it just consists of multiplying by a scalar. So we can let
act on this one element we’ve cooked up, and then read off the coefficient using the inner product.
The linear transformation sends
to the vector
. By functoriality, it sends
to
. And now we want to calculate the coefficient.
The determinant of is exactly the factor by which
acting on the top degree subspace in
expands any given element.
Tensor Algebras and Inner Products
Let’s focus back in on a real, finite-dimensional vector space and give it an inner product. As a symmetric bilinear form, the inner product provides us with an isomorphism
. Now we can use functoriality to see what this does for our tensor algebras. Again, I’ll be mostly interested in the exterior algebra
, so I’ll stick to talking about that one.
The isomorphism sends a vector to the linear functional
. Functoriality then defines an isomorphism
that sends the wedge
of degree
to the wedge
, also of degree
. This is the antisymmetrization of the tensor product of all these linear functionals. We’ve seen that we can consider this as a linear functional on the space of degree
tensors by applying the functionals to the tensorands in order and then multiplying together all the results. This defines an isomorphism
, and extending by linearity we find an isomorphism
.
Let’s get a little more explicit about how this works by picking an orthonormal basis for
, and the corresponding dual basis
for
. That is, we have
— which defines the isomorphism from
to
explicitly in terms of bases — and
.
Now we can use the to write down an explicit basis of
. An element of degree
is the sum of wedges of
vectors
. We can write each of these vectors out in terms of components
, getting the wedge (really a sum of wedges)
We factor out all the scalar components to get
If in a given term we ever have two of the indices equal to each other, then the whole wedge will be zero by antisymmetry. On the other hand, if none of them are equal we can sort them into increasing order (at the possible cost of multiplying by the sign of the needed permutation). In the end, we can write down any wedge of degree
uniquely as a sum of constants times the basic wedges
, where
. For example, if
has basis
, then
will have basis
where the lines correspond to the different degrees.
Now it’s obvious how the isomorphism acts on this basis. It just turns a wedge of basis vectors like into a wedge of basis linear functionals like
. The action on the rest of
just extends by linearity. When we compose this with the isomorphism between
and
, we get an isomorphism
. That is, we have an inner product on the algebra
!
Let’s consider how this inner product behaves on our basis Clearly to line these up we need the degrees to be equal. We also find that we get zero unless the collections of indices are the same. For example, if we try to pair with
, we find
In each arrangement, we’ll find two indices that don’t line up, and thus each term will be zero. On the other hand, if the collections of indices are the same, we find (for example)
When we consider a basic wedge of degree (here,
) and pair it with itself, we’ll have a sum of
terms corresponding to summing over permutations of both tensors. Of these, terms that pick different permutations will have at least one pair of basis vectors that don’t line up, and make the whole term zero. The remaining
terms that pick the same permutation twice will give the product of
copies of
, and this will always occur with a positive sign. This will exactly cancel one of the two normalizing factors from the antisymmetrizers, and thus the inner product of a basic wedge of degree
with itself will always be
. It’s not an orthonormal basis, but it’s close.
Notice, in particular, how in the second example we’ve avoided explicit use of the dual basis and just defined the inner product on tensors of rank as the
-fold product of inner products of vectors. We’ll stick to this notation in the future for tensors.
The factor of isn’t really terrible, but it can get annoying. Often the inner product on
is modified to compensate for it. We consider the different degrees to be orthogonal, as before, and we define the inner product in degree
to include an extra factor of
. This has the effect of making the collection of wedges of basis vectors into an orthonormal basis for
, but it means that the inner product on wedges can not be calculated simply by considering them as antisymmetric tensors.
Now, I’ve never really looked closely at exactly what happens, so as an experiment I’m going to try to not use this extra factor of and see what happens. I’ll refer, as I do, to the “renormalized” inner product on
, where appropriate. And if the work starts becoming too complicated without this factor, I’ll give in and use it, explicitly saying when I’ve given up.
Functoriality of Tensor Algebras
The three constructions we’ve just shown — the tensor, symmetric tensor, and exterior algebras — were all asserted to be the “free” constructions. This makes them functors from the category of vector spaces over to appropriate categories of
-algebras, and that means that they behave very nicely as we transform vector spaces, and we can even describe exactly how nicely with explicit algebra homomorphisms. I’ll work through this for the exterior algebra, since that’s the one I’m most interested in, but the others are very similar.
Okay, we want the exterior algebra to be the “free” graded-commutative algebra on the vector space
. That’s a tip-off that we’re thinking
should be the left adjoint of the “forgetful” functor
which sends a graded-commutative algebra to its underlying vector space (Todd makes a correction to which forgetful functor we’re using below). We’ll define this adjunction by finding a collection of universal arrows, which (along with the forgetful functor ) is one of the many ways we listed to specify an adjunction.
So let’s run down the checklist. We’ve got the forgetful functor which we’re going to make the right-adjoint. Now for each vector space
we need a graded-commutative algebra — clearly the one we’ll pick is
— and a universal arrow
. The underlying vector space of the exterior algebra is the direct sum of all the spaces of antisymmetric tensors on
.
Yesterday we wrote this without the , since we often just omit forgetful functors, but today we want to remember that we’re using it. But we know that
, so the obvious map
to use is the one that sends a vector
to itself, now considered as an antisymmetric tensor with a single tensorand.
But is this a universal arrow? That is, if is another graded-commutative algebra, and
is another linear map, then is there a unique homomorphism of graded-commutative algebras
so that
? Well,
tells us where in
we have to send any antisymmetric tensor with one tensorand. Any other element
in
is the sum of a bunch of terms, each of which is the wedge of a bunch of elements of
. So in order for
to be a homomorphism of graded-commutative algebras, it has to act by simply changing each element of
in our expression for
into the corresponding element of
, and then wedging and summing these together as before. Just write out the exterior algebra element all the way down in terms of vectors, and transform each vector in the expression. This will give us the only possible such homomorphism
. And this establishes that
is the object-function of a functor which is left-adjoint to
.
So how does work on morphisms? It’s right in the proof above! If we have a linear map
, we need to find some homomorphism
. But we can compose
with the linear map
, which gives us
. The universality property we just proved shows that we have a unique homomorphism
. And, specifically, it is defined on an element
by writing down
in terms of vectors in
and applying
to each vector in the expression to get a sum of wedges of elements of
, which will be an element of the algebra
.
Of course, as stated above, we get similar constructions for the commutative algebra and the tensor algebra
.
Since, given a linear map the induced homomorphisms
,
, and
preserve the respective gradings, they can be broken into one linear map for each degree. And if
is invertible, so must be its image under each functor. These give exactly the tensor, symmetric, and antisymmetric representations of the group
, if we consider how these functors act on invertible morphisms
. Functoriality is certainly a useful property.
Exterior Algebras
Let’s continue yesterday’s discussion of algebras we can construct from a vector space. Today, we consider the “exterior algebra” on , which consists of the direct sum of all the spaces of antisymmetric tensors
Yes, that’s a capital , not an
. This is just standard notation, probably related to the symbol for its multiplication we’ll soon come to.
Again, despite the fact that each is a subspace of the tensor space
, this isn’t a subalgebra of
, because the tensor product of two antisymmetric tensors may not be antisymmetric itself. Instead, we will take the tensor product of
and
, and then antisymmetrize it, to give
. This will be bilinear, but will it be associative?
Our proof parallels the one we ran through yesterday, writing the symmetric group as the disjoint union of cosets indexed by a set of representatives
and rewriting the symmetrizer in just the right way. But now we’ve got the signs of our permutations to be careful with. Still, let’s dive in with the antisymmetrizers
Where throughout we’ve used the fact that is a representation, and so the signum of the product of two group elements is the product of their signa. We also make the crucial combination of the double sum over
into a single sum by noting that each group element shows up exactly
times, and each time it shows up with the exact same sign, which lets us factor out
from the sum and cancel the normalizing factor.
Now this multiplication is not commutative. Instead, it’s graded-commutative. If and
are elements of the exterior algebra, then we find
That is, elements of odd degree anticommute with each other, while elements of even degree commute with everything.
Indeed, given and
, we can let
be the permutation which moves the last
slots to the beginning of the term and the first
slots to the end. We can construct
by moving each of the last
slots one-by-one past the first
, taking
swaps for each one. That gives a total of
swaps, so
. Then we write
as asserted.
The dual to the exterior algebra is the algebra of all alternating multilinear functionals on
, providing a counterpart to the algebra of polynomial functions on
. But where the variables in polynomial functions commute with each other, the basic covectors — analogous to variables reading off components of a vector — anticommute with each other in this algebra.
Tensor and Symmetric Algebras
There are a few graded algebras we can construct with our symmetric and antisymmetric tensors, and at least one of them will be useful. Remember that we also have symmetric and alternating multilinear functionals in play, so the same constructions will give rise to even more algebras.
First and easiest we have the tensor algebra on . This just takes all the tensor powers of
and direct sums them up
This gives us a big vector space — an infinite-dimensional one, in fact — but it’s not an algebra until we define a bilinear multiplication. For this one, we’ll just define the multiplication by the tensor product itself. That is, if and
are two tensors, their product will be
, which is by definition bilinear. This algebra has an obvious grading by the number of tensorands.
This is exactly the free algebra on a vector space, and it’s just like we built the free ring on an abelian group. If we perform the construction on the dual space we get an algebra of functions. If
has dimension
, then this is isomorphic to the algebra
of noncommutative polynomials in
variables.
Next we consider the symmetric algebra on , which consists of the direct sum of all the spaces of symmetric tensors
with a grading again given by the number of tensorands.
Now, despite the fact that each is a subspace of the tensor space
, this is not a subalgebra of
. This is because the tensor product of two symmetric tensors may well not be symmetric itself. Instead, we will take the tensor product of
and
, and then symmetrize it, to give
. This will be bilinear, and it will work with our choice of grading, but will it be associative?
If we have three symmetric tensors ,
, and
, then we could multiply them by
or by
. To get the first of these, we tensor
and
, symmetrize the result, then tensor with
and symmetrize that. But since symmetrizing
consists of adding up a number of shuffled versions of this tensor, we could tensor with
first and then symmetrize only the first
tensorands, before finally tensoring the entire thing. I assert that these two symmetrizations — the first one on only part of the whole term — are equivalent to simply symmetrizing the whole thing. Similarly, symmetrizing the last
tensorands followed by symmetrizing the whole thing is equivalent to just symmetrizing the whole thing. And so both orders of multiplication are the same, and the operation
indeed defines an associative multiplication.
To see this, remember that symmetrizing the whole term involves a sum over the symmetric group , while symmetrizing over the beginning involves a sum over the subgroup
consisting of those permutations acting on only the first
places. This will be key to our proof. We consider the collection of left cosets of
within
. For each one, we can pick a representative element (this is no trouble since there are only a finite number of cosets with a finite number of elements each) and collect these representatives into a set
. Then the whole group
is the disjoint union
This will let us rewrite the symmetrizer in such a way as to make our point. So let’s write down the product of the two group algebra elements we’re interested in
Essentially, because the symmetrization of the whole term subsumes symmetrization of the first tensorands, the smaller symmetrization can be folded in, and the resulting sum counts the whole sum exactly
times, which cancels out the normalization factor. And this proves that the multiplication is, indeed, associative.
This multiplication is also commutative. Indeed, given and
, we can let
be the permutation which moves the last
slots to the beginning of the term and the first
slots to the end. Then we write
because right-multiplication by just shuffles around the order of the sum.
The symmetric algebra is the free commutative algebra on the vector space
. And so it should be no surprise that the symmetric algebra on the dual space is isomorphic to the algebra of polynomial functions on
, where the grading is the total degree of a monomial. If
has finite dimension
, we have
.
Graded Objects
We’re about to talk about certain kinds of algebras that have the added structure of a “grading”. It’s not horribly important at the moment , but we might as well talk about it now so we don’t forget later.
Given a monoid , a
-graded algebra is one that, as a vector space, we can write as a direct sum
so that the product of elements contained in two grades lands in the grade given by their product in the monoid. That is, we can write the algebra multiplication by
for each pair of grades and
. As usual, we handle elements that are the sum of two elements with different grades by linearity.
By far the most common grading is by the natural numbers under addition, in which case we often just say “graded”. For example, the algebra of polynomials is graded, where the grading is given by the total degree. That is, if is the algebra of polynomials in
variables, then the
grade consists of sums of products of
of the variables at a time. This is a grading because the product of two such homogeneous polynomials is itself homogeneous, and the total degree of each term in the product is the sum of the degrees of the factors. For instance, the product of
in grade
and
in grade
is
in grade .
Other common gradings include -grading and
-grading. The latter algebras are often called “superalgebras”, related to their use in studying supersymmetry in physics. “Superalgebra” sounds a lot more big and impressive than “
-graded algebra”, and physicists like that sort of thing.
In the context of graded algebras we also have graded modules. A -graded module
over the
-graded algebra
can also be written down as a direct sum
But now it’s the action of on
that involves the grading:
We can even talk about grading in the absence of a multiplicative structure, like a graded vector space. Now we don’t even really need the grades to form a monoid. Indeed, for any index set we might have the graded vector space
This doesn’t seem to be very useful, but it can serve to recognize natural direct summands in a vector space and keep track of them. For instance, we may want to consider a linear map between graded vector spaces
and
that only acts on one grade of
and with an image contained in only one grade of
:
We’ll say that such a map is graded . Any linear map from
to
can be decomposed uniquely into such graded components
giving a grading on the space of linear maps.
Multilinear Functionals
Okay, time for a diversion from all this calculus. Don’t worry, there’s tons more ahead.
We’re going to need some geometric concepts tied to linear algebra, and before we get into that we need to revisit an old topic: tensor powers and the subspaces of symmetric and antisymmetric tensors. Specifically, how do all of these interact with duals. Through these post we’ll be working with a vector space over a field
, which at times will be assumed to be finite-dimensional, but will not always be.
First, we remember that elements of the dual space are called “linear functionals”. These are
-linear functions from the vector space
to the base field
. Similarly, a “
-multilinear functional” is a function
that takes
vectors from
and gives back a field element in
in a way that’s
-linear in each variable. That is,
for scalars and
, and for any index
. Equivalently, by the defining universal property of tensor products, this is equivalent to a linear function
— a linear functional on
. That is, the space of
-multilinear functionals is the dual space
.
There’s a good way to come up with -multilinear functionals. Just take
linear functionals and sew them together. That is, if we have an
-tuple of functionals
we can define an
-multilinear functional by the formula
We just feed the th tensorand
into the
th functional
and multiply all the resulting field elements together. Since field multiplication is multilinear, so is this operation. Then the universal property of tensor products tells us that this mapping from
-tuples of linear functionals to
-multilinear functionals is equivalent to a unique linear map from the
th tensor power
. It’s also easy to show that this map has a trivial kernel.
This is not to say that dualization and tensor powers commute. Indeed, in general this map is a proper monomorphism. But it turns out that if is finite-dimensional, then it’s actually an isomorphism. Just count the dimensions — if
has dimension
then each space has dimension
— and use the rank-nullity theorem to see that they must be isomorphic. That is, every
-multilinear functional is a linear combination of the ones we can construct from
-tuples of linear functionals.
Now we can specialize this result. We define a multilinear functional to be symmetric if its value is unchanged when we swap two of its inputs. Equivalently, it commutes with the symmetrizer. That is, it must kill everything that the symmetrizer kills, and so must really define a linear functional on the subspace of symmetric tensors. That is, the space of symmetric -multilinear functionals is the dual space
. We can construct such symmetric multilinear functionals by taking
linear functionals as before and symmetrizing them. This gives a monomorphism
, which is an isomorphism if
is finite-dimensional.
Similarly, we define a multilinear functional to be asymmetric or “alternating” if its value changes sign when we swap two of its inputs. Then it commutes with the antisymmetrizer, must kill everything the antisymmetrizer kills, and descends to a linear functional on the subspace of antisymmetric tensors. As before, we can construct just such an antisymmetric -multilinear functional by antisymmetrizing
linear functionals, and get a monomorphism
. And yet again, this map is an isomorphism if
is finite-dimensional.
Bad Language
I know I’ve been really lazy about my blogroll, and I should get around to that sometime. But there’s a new one I have to point out right away: The Language of Bad Physics (as noted in the comments, now on WordPress at this address). Like Frank Costanza on Festivus, she’s got a lot of problems with you people (physics writers) and how mathematical terms get mangled and confounded in the physics literature. It’s one big Airing of Grievances, and you’d do well to listen up.
My only complaint is that it’s one of those horrid, ugly Blogspot pages and not a nice -enabled WordPress page. But maybe it’s early enough to get her to switch
Smoothness
With Clairaut’s theorem comes the first common example of a smoothness assumption. It’s a good time to say just what I mean by this.
Let’s look at an open region . We can now define a tower of algebras of functions on this set. We start by setting out the real-valued functions which are continuous at each point of
, and write this as
. It’s an algebra under pointwise addition and multiplication of functions.
Next we consider those functions which have all partial derivatives at every point of , and these partial derivatives are themselves continuous throughout
. We’ve seen that this will imply that such a function has a differential at each point of
. This gives us a subalgebra of
which we write as
. That is, these functions have “one continuous derivative”, or are “once (continuously) differentiable”.
Continuing on, we consider those functions which have all second partial derivatives, and that these second partials are themselves continuous at each point of . Clairaut’s theorem tells us that the mixed second partials are equal, since they’re both continuous, and we can define the second differential. These functions form a subalgebra of
(and thus a further subalgebra of
) which we write as
. These functions have “two continuous derivatives”, or are “twice (continuously) differentiable”.
From here it’s clear how to proceed, defining functions with higher and higher differentials. We get algebras ,
, and so on. We can even define the infinitely differentiable functions
to be the limit (in the categorical sense) of this process. It consists of all the functions that are in
for all natural numbers
. Taking any directional derivative (with a constant direction) of a function in
lands us in
, although such differentiation sends
back into itself.
Is this the end? Not quite. Just like in one variable we have analytic functions. Once we’re in and we have all higher derivatives we can use Taylor’s theorem to write out the Taylor series of our function. But this may or may not converge back to the function itself. If it does for every point in
we say that the function is analytic on
. The collection of all analytic functions forms a subalgebra of
which we write as
.
It’s interesting to observe that at each step, “most” functions fail to fall into the finer subalgebra, just like “most” points on the -
plane are not on the
-axis. An arbitrary function selected from
will actually lie within
with probability zero. An arbitrary infinitely differentiable function is analytic with probability zero. Pretty much every example we show students in calculus classes is infinitely differentiable, if not analytic, and yet such functions make up a vanishingly small portion of even once differentiable functions.
So, what does it mean to say that a function is “smooth”? It’s often said to mean that a function is in — that it has derivatives of all orders. But in practice, this seems to actually be just a convenience. What “smooth” actually means is a subtler point.
Let’s say I’m working in some situation where I’m going to be taking first and second partial derivatives of a function in a region, and I’m going to want the mixed partials to commute by Clairaut’s theorem. If I say that is infinitely differentiable on
, this will certainly do the trick. But I’ve excluded a huge number of functions. All I really need is for
to fall into
, of which
is an incredibly tiny subalgebra.
In practice, then, “smooth” effectively means “has enough derivatives to do what I want with it”. It’s a way of saying that we understand that it’s possible to come up with pathological cases which break the theorem we’re stating, but as long as we have sufficiently many derivatives (where “sufficiently many” is some fixed natural number we don’t care to work out in detail) the pathological cases can be excluded. Saying that “smooth” means “infinitely differentiable” accomplishes this goal, and it’s usually easier than trying to stomach the idea that “smoothness” is a highly context-dependent term of art rather than a nice, well-defined mathematical concept.
Taylor’s Theorem
Like I said yesterday, because of extraneous terms the higher differentials don’t transform well, and so they’re not going to be useful for many of our purposes. However, there’s one thing it’s really good for: generalizing Taylor’s theorem. Specifically, the version of Taylor’s theorem that resembles the mean value theorem. And we’ll even use an approach like we did for the extension of that result to higher dimensions.
So let’s say that has continuous partial derivatives up to order
in an open region
. If
and
are two points in
so that the whole line segment
is contained in
, then there exists some point
along that segment so that
Just like with the mean value theorem, we’ll define a new function for
in the closed interval
. This is a composite of the function
with the function
. This function is clearly differentiable, with constant derivative
. And so the chain rule tells us that
where we’ll fudge the distinction between the derivative and the differential of because it’s a real-valued function of a single real variable.
But now we can do the same thing to take the second derivative.
and so on, with each derivative of being given by a similar formula
up until the index . Beyond that we don’t know that the higher differentials exist.
Now we take all these derivatives and stick them into the usual one-variable Taylor theorem, which tells us that
for some in the interval
. With our formulæ for
and its derivatives, this becomes
where is a point on the line segment
, and the theorem is proved.
