The Unapologetic Mathematician

Mathematics for the interested outsider

Inner Products on Exterior Algebras and Determinants

I want to continue yesterday’s post with some more explicit calculations to hopefully give a bit more of the feel.

First up, let’s consider wedges of degree k. That is, we pick k vectors \left\{v_i\right\}_{i=1}^k and wedge them all together (in order) to get v_1\wedge\dots\wedge v_k. What is its inner product with another of the same form? We calculate

\displaystyle\begin{aligned}\langle v_1\wedge\dots\wedge v_k,w_1\wedge\dots\wedge w_k\rangle&=\frac{1}{k!}\frac{1}{k!}\sum\limits_{\pi\in S_k}\sum\limits_{\hat{\pi}\in S_k}\mathrm{sgn}(\pi\hat{\pi})\langle v_{\pi(1)}\otimes\dots\otimes v_{\pi(k)},w_{\hat{\pi}(1)}\otimes\dots\otimes w_{\hat{\pi}(k)}\rangle\\&=\frac{1}{k!}\frac{1}{k!}\sum\limits_{\pi\in S_k}\sum\limits_{\hat{\pi}\in S_k}\mathrm{sgn}(\pi\hat{\pi})\langle v_{\pi(1)},w_{\hat{\pi}(1)}\rangle\dots\langle v_{\pi(k)},w_{\hat{\pi}(k)}\rangle\\&=\frac{1}{k!}\frac{1}{k!}\sum\limits_{\pi\in S_k}\sum\limits_{\hat{\pi}\in S_k}\mathrm{sgn}(\pi^{-1}\hat{\pi})\langle v_1,w_{\pi^{-1}(\hat{\pi}(1))}\rangle\dots\langle v_{k},w_{\pi^{-1}(\hat{\pi}(k))}\rangle\\&=\frac{1}{k!}\frac{1}{k!}\sum\limits_{\pi\in S_k}\sum\limits_{\sigma\in S_k}\mathrm{sgn}(\sigma)\langle v_1,w_{\sigma(1)}\rangle\dots\langle v_k,w_{\sigma(k)}\rangle\\&=\frac{1}{k!}\sum\limits_{\sigma\in S_k}\mathrm{sgn}(\sigma)\langle v_1,w_{\sigma(1)}\rangle\dots\langle v_k,w_{\sigma(k)}\rangle\end{aligned}

where in the third line we’ve rearranged the factors at the right and used the fact that \mathrm{sgn}(\pi)=\mathrm{sgn}(\pi^{-1}), and in the fourth line we’ve relabelled \sigma=\pi^{-1}\hat{\pi}. This looks a lot like the calculation of a determinant. In fact, it is \frac{1}{k!} times the determinant of the matrix with entries \langle v_i,w_j\rangle.

\displaystyle \langle v_1\wedge\dots\wedge v_k,w_1\wedge\dots\wedge w_k\rangle=\frac{1}{k!}\det\left(\langle v_i,w_j\rangle\right)

If we use the “renormalized” inner product on \Lambda(V) from the end of yesterday’s post, then we get an extra factor of k!, which cancels off the \frac{1}{k!} and gives us exactly the determinant.

We can use the inner product to read off components of exterior algebra elements. If \mu is an element of degree k we write

\displaystyle\mu^{i_1\dots i_k}=k!\langle e_{i_1}\wedge\dots\wedge e_{i_k},\mu\rangle

As an explicit example, we may take V to have dimension 3 and consider an element of degree 2 in \Lambda(V)

\displaystyle\mu=\mu^{12}e_1\wedge e_2+\mu^{13}e_1\wedge e_3+\mu^{23}e_2\wedge e_3

We call what we’re writing in the superscript to \mu we call a “multi-index”, and sometimes we just write it as I, which in the summation convention runs over all increasing collections of k indices. Correspondingly, we can just write e_I=e_{i_1}\wedge\dots\wedge e_{i_k} for the multi-index I=(i_1,\dots,i_k).

Alternatively, we could expand the wedges out in terms of tensors:

\displaystyle\begin{aligned}\mu&=\mu^{12}e_1\wedge e_2+\mu^{13}e_1\wedge e_3+\mu^{23}e_2\wedge e_3\\&=\mu^{12}e_1\otimes e_2-\mu^{12}e_2\otimes e_1+\mu^{13}e_1\wedge e_3-\mu^{13}e_3\wedge e_1+\mu^{23}e_2\wedge e_3-\mu^{23}e_3\wedge e_2\\&=\mu^{12}e_1\otimes e_2+\mu^{21}e_2\otimes e_1+\mu^{13}e_1\wedge e_3+\mu^{31}e_3\wedge e_1+\mu^{23}e_2\wedge e_3+\mu^{32}e_3\wedge e_2\\&=\mu^{ij}e_i\otimes e_j\end{aligned}

where we just think of the superscript as a collection of k separate indices, all of which run from 1 to the dimension of V, with the understanding that \mu^{ij}=-\mu^{ji}, and similarly for higher degrees; swapping two indices switches the sign of the component. All this index juggling gets distracting and confusing, but it’s sometimes necessary for explicit computations, and the physicists love it.

Anyway, we can use this to get back to our original definition of the determinant of a linear transformation T. Pick a orthonormal basis \left\{e_i\right\}_{i=1}^n for V and wedge them all together to get an element e_1\wedge\dots\wedge e_n of top degree in \Lambda(V). Since the space of top degree is one-dimensional, any linear transformation on it just consists of multiplying by a scalar. So we can let T act on this one element we’ve cooked up, and then read off the coefficient using the inner product.

The linear transformation T sends e_i to the vector T(e_i)=t_i^je_j. By functoriality, it sends e_1\wedge\dots\wedge e_n to T(e_1)\wedge\dots\wedge T(e_n). And now we want to calculate the coefficient.

\displaystyle\begin{aligned}n!\langle e_1\wedge\dots\wedge e_n,T(e_1)\wedge\dots\wedge T(e_n)\rangle&=\frac{n!}{n!}\det\left(\langle e_j,T(e_i)\rangle\right)\\&=\det\left(\langle e_j,t_i^ke_k\rangle\right)\\&=\det\left(t_i^k\langle e_j,e_k\rangle\right)\\&=\det\left(t_i^k\delta_k^j\right)\\&=\det\left(t_i^j\right)\end{aligned}

The determinant of T is exactly the factor by which T acting on the top degree subspace in \Lambda(V) expands any given element.

October 30, 2009 Posted by | Algebra, Linear Algebra | 3 Comments

Tensor Algebras and Inner Products

Let’s focus back in on a real, finite-dimensional vector space V and give it an inner product. As a symmetric bilinear form, the inner product provides us with an isomorphism V\rightarrow V^*. Now we can use functoriality to see what this does for our tensor algebras. Again, I’ll be mostly interested in the exterior algebra \Lambda(V), so I’ll stick to talking about that one.

The isomorphism sends a vector v to the linear functional \langle v,\underbar{\hphantom{x}}\rangle. Functoriality then defines an isomorphism \Lambda(V)\rightarrow\Lambda(V^*) that sends the wedge v_1\wedge\dots\wedge v_k of degree k to the wedge \langle v_1,\underbar{\hphantom{x}}\rangle\wedge\dots\wedge \langle v_k,\underbar{\hphantom{x}}\rangle, also of degree k. This is the antisymmetrization of the tensor product of all these linear functionals. We’ve seen that we can consider this as a linear functional on the space of degree k tensors by applying the functionals to the tensorands in order and then multiplying together all the results. This defines an isomorphism A^k(V^*)\rightarrow(A^kV)^*, and extending by linearity we find an isomorphism \Lambda(V^*)\rightarrow\Lambda(V)^*.

Let’s get a little more explicit about how this works by picking an orthonormal basis \left\{e_i\right\}_{i=1}^n for V, and the corresponding dual basis \left\{\epsilon^i\right\}_{i=1}^n for V^*. That is, we have \epsilon^i=\langle e_i,\underbar{\hphantom{x}}\rangle — which defines the isomorphism from V to V^* explicitly in terms of bases — and \epsilon^i(e_j)=\delta^i_j.

Now we can use the e_i to write down an explicit basis of \Lambda(V). An element of degree k is the sum of wedges of k vectors v_1\wedge\dots\wedge v_k. We can write each of these vectors out in terms of components v_j=v_j^ie_i, getting the wedge (really a sum of wedges)


We factor out all the scalar components to get

\displaystyle\left(v_1^{i_1}\dots v_k^{i_k}\right)e_{i_1}\wedge\dots\wedge e_{i_k}

If in a given term we ever have two of the indices i_j equal to each other, then the whole wedge will be zero by antisymmetry. On the other hand, if none of them are equal we can sort them into increasing order (at the possible cost of multiplying by the sign of the needed permutation). In the end, we can write down any wedge of degree k uniquely as a sum of constants times the basic wedges e_{i_1}\wedge\dots\wedge e_{i_k}, where i_1<\dots<i_k. For example, if V has basis \left\{e_1,e_2,e_3\right\}, then \Lambda(V) will have basis

\displaystyle\left\{\begin{matrix}&1&\\e_1&e_2&e_3\\e_2\wedge e_3&e_1\wedge e_3&e_1\wedge e_2\\&e_1\wedge e_2\wedge e_3&\end{matrix}\right\}

where the lines correspond to the different degrees.

Now it’s obvious how the isomorphism acts on this basis. It just turns a wedge of basis vectors like e_2\wedge e_3 into a wedge of basis linear functionals like \epsilon^2\wedge\epsilon^3. The action on the rest of \Lambda(V) just extends by linearity. When we compose this with the isomorphism between \Lambda(V^*) and \Lambda(V)^*, we get an isomorphism \Lambda(V)\rightarrow\Lambda(V)^*. That is, we have an inner product on the algebra \Lambda(V)!

Let’s consider how this inner product behaves on our basis Clearly to line these up we need the degrees to be equal. We also find that we get zero unless the collections of indices are the same. For example, if we try to pair e_1\wedge e_2 with e_1\wedge e_3, we find

\displaystyle\begin{aligned}\langle e_1\wedge e_2,e_1\wedge e_3\rangle&=\left[\epsilon^1\wedge\epsilon^2\right](e_1\wedge e_3)\\&=\frac{1}{2!}\frac{1}{2!}\left[\epsilon^1\otimes\epsilon^2-\epsilon^2\otimes\epsilon^1\right](e_1\otimes e_3-e_3\otimes e_1)\\&=\frac{1}{4}\left(\epsilon^1(e_1)\epsilon^2(e_3)-\epsilon^1(e_3)\epsilon^2(e_1)-\epsilon^2(e_1)\epsilon^1(e_3)+\epsilon^2(e_3)\epsilon^1(e_1)\right)\\&=\frac{1}{4}\left(1\cdot0-0\cdot0-0\cdot0+0\cdot1\right)\\&=0\end{aligned}

In each arrangement, we’ll find two indices that don’t line up, and thus each term will be zero. On the other hand, if the collections of indices are the same, we find (for example)

\displaystyle\begin{aligned}\langle e_1\wedge e_2,e_1\wedge e_2\rangle&=\frac{1}{2!}\frac{1}{2!}\left(\langle e_1\otimes e_2,e_1\otimes e_2\rangle-\langle e_1\otimes e_2,e_2\otimes e_1\rangle-\langle e_2\otimes e_1,e_1\otimes e_2\rangle+\langle e_2\otimes e_1,e_2\otimes e_1\rangle\right)\\&=\frac{1}{4}\left(\langle e_1,e_1\rangle\langle e_2,e_2\rangle-\langle e_1,e_2\rangle\langle e_2,e_1\rangle-\langle e_2,e_1\rangle\langle e_1,e_2\rangle+\langle e_2,e_2\rangle\langle e_1,e_1\rangle\right)\\&=\frac{1}{4}\left(1\cdot1-0\cdot0-0\cdot0+1\cdot1\right)\\&=\frac{1}{2}\end{aligned}

When we consider a basic wedge of degree k (here, k=2) and pair it with itself, we’ll have a sum of (k!)^2 terms corresponding to summing over permutations of both tensors. Of these, terms that pick different permutations will have at least one pair of basis vectors that don’t line up, and make the whole term zero. The remaining k! terms that pick the same permutation twice will give the product of k copies of {1}, and this will always occur with a positive sign. This will exactly cancel one of the two normalizing factors from the antisymmetrizers, and thus the inner product of a basic wedge of degree k with itself will always be \frac{1}{k!}. It’s not an orthonormal basis, but it’s close.

Notice, in particular, how in the second example we’ve avoided explicit use of the dual basis and just defined the inner product on tensors of rank k as the k-fold product of inner products of vectors. We’ll stick to this notation in the future for tensors.

The factor of \frac{1}{k!} isn’t really terrible, but it can get annoying. Often the inner product on \Lambda(V) is modified to compensate for it. We consider the different degrees to be orthogonal, as before, and we define the inner product in degree k to include an extra factor of k!. This has the effect of making the collection of wedges of basis vectors into an orthonormal basis for \Lambda(V), but it means that the inner product on wedges can not be calculated simply by considering them as antisymmetric tensors.

Now, I’ve never really looked closely at exactly what happens, so as an experiment I’m going to try to not use this extra factor of k! and see what happens. I’ll refer, as I do, to the “renormalized” inner product on \Lambda(V), where appropriate. And if the work starts becoming too complicated without this factor, I’ll give in and use it, explicitly saying when I’ve given up.

October 29, 2009 Posted by | Algebra, Linear Algebra | 4 Comments

Functoriality of Tensor Algebras

The three constructions we’ve just shown — the tensor, symmetric tensor, and exterior algebras — were all asserted to be the “free” constructions. This makes them functors from the category of vector spaces over \mathbb{F} to appropriate categories of \mathbb{F}-algebras, and that means that they behave very nicely as we transform vector spaces, and we can even describe exactly how nicely with explicit algebra homomorphisms. I’ll work through this for the exterior algebra, since that’s the one I’m most interested in, but the others are very similar.

Okay, we want the exterior algebra \Lambda(V) to be the “free” graded-commutative algebra on the vector space V. That’s a tip-off that we’re thinking \Lambda should be the left adjoint of the “forgetful” functor U which sends a graded-commutative algebra to its underlying vector space (Todd makes a correction to which forgetful functor we’re using below). We’ll define this adjunction by finding a collection of universal arrows, which (along with the forgetful functor U) is one of the many ways we listed to specify an adjunction.

So let’s run down the checklist. We’ve got the forgetful functor U which we’re going to make the right-adjoint. Now for each vector space V we need a graded-commutative algebra — clearly the one we’ll pick is \Lambda(V) — and a universal arrow \eta_V:V\rightarrow U(\Lambda(V)). The underlying vector space of the exterior algebra is the direct sum of all the spaces of antisymmetric tensors on V.

\displaystyle U(\Lambda(V))=\bigoplus\limits_{n=0}^\infty A^n(V)

Yesterday we wrote this without the U, since we often just omit forgetful functors, but today we want to remember that we’re using it. But we know that A^1(V)=V, so the obvious map \eta_V to use is the one that sends a vector v to itself, now considered as an antisymmetric tensor with a single tensorand.

But is this a universal arrow? That is, if A is another graded-commutative algebra, and \phi:V\rightarrow U(A) is another linear map, then is there a unique homomorphism of graded-commutative algebras \bar{\phi}:\Lambda(V)\rightarrow A so that \phi=U(\bar{\phi})? Well, \phi tells us where in A we have to send any antisymmetric tensor with one tensorand. Any other element \upsilon in \Lambda(V) is the sum of a bunch of terms, each of which is the wedge of a bunch of elements of V. So in order for \bar{\phi} to be a homomorphism of graded-commutative algebras, it has to act by simply changing each element of V in our expression for \upsilon into the corresponding element of A, and then wedging and summing these together as before. Just write out the exterior algebra element all the way down in terms of vectors, and transform each vector in the expression. This will give us the only possible such homomorphism \bar{\phi}. And this establishes that \Lambda(V) is the object-function of a functor which is left-adjoint to U.

So how does \Lambda work on morphisms? It’s right in the proof above! If we have a linear map f:V\rightarrow W, we need to find some homomorphism \Lambda(f):\Lambda(V)\rightarrow\Lambda(W). But we can compose f with the linear map \eta_W, which gives us \eta_W\circ f:V\rightarrow U(\Lambda(W)). The universality property we just proved shows that we have a unique homomorphism \Lambda(f)=\overline{\eta_W\circ f}:\Lambda(V)\rightarrow\Lambda(W). And, specifically, it is defined on an element \upsilon\in\Lambda(V) by writing down \upsilon in terms of vectors in V and applying f to each vector in the expression to get a sum of wedges of elements of W, which will be an element of the algebra \Lambda(W).

Of course, as stated above, we get similar constructions for the commutative algebra S(V) and the tensor algebra T(V).

Since, given a linear map f the induced homomorphisms \Lambda(f), S(f), and T(f) preserve the respective gradings, they can be broken into one linear map for each degree. And if f is invertible, so must be its image under each functor. These give exactly the tensor, symmetric, and antisymmetric representations of the group \mathrm{GL}(V), if we consider how these functors act on invertible morphisms f:V\rightarrow V. Functoriality is certainly a useful property.

October 28, 2009 Posted by | Algebra, Category theory, Linear Algebra, Representation Theory, Universal Properties | 5 Comments

Exterior Algebras

Let’s continue yesterday’s discussion of algebras we can construct from a vector space. Today, we consider the “exterior algebra” on V, which consists of the direct sum of all the spaces of antisymmetric tensors

\displaystyle \Lambda(V)=\bigoplus\limits_{n=0}^\infty A^n(V)

Yes, that’s a capital \lambda, not an A. This is just standard notation, probably related to the symbol for its multiplication we’ll soon come to.

Again, despite the fact that each A^n(V) is a subspace of the tensor space T^{\otimes n}, this isn’t a subalgebra of T(V), because the tensor product of two antisymmetric tensors may not be antisymmetric itself. Instead, we will take the tensor product of \mu\in S^m(V) and \nu\in A^n(V), and then antisymmetrize it, to give \mu\wedge\nu\in A^{m+n}(V). This will be bilinear, but will it be associative?

Our proof parallels the one we ran through yesterday, writing the symmetric group as the disjoint union of cosets indexed by a set C of representatives

\displaystyle S_{l+m+n}=\biguplus\limits_{\gamma\in\Gamma}\gamma S_{l+m}

and rewriting the symmetrizer in just the right way. But now we’ve got the signs of our permutations to be careful with. Still, let’s dive in with the antisymmetrizers

\displaystyle\begin{aligned}\left(\frac{1}{(l+m+n)!}\sum\limits_{\pi\in S_{l+m+n}}\mathrm{sgn}(\pi)\pi\right)\left(\frac{1}{(l+m)!}\sum\limits_{\hat{\pi}\in S_{l+m}}\mathrm{sgn}(\hat{\pi})\hat{\pi}\right)&=\left(\frac{1}{(l+m+n)!}\sum\limits_{\gamma\in\Gamma}\sum\limits_{\pi\in\gamma S_{l+m}}\mathrm{sgn}(\pi)\pi\right)\left(\frac{1}{(l+m)!}\sum\limits_{\hat{\pi}\in S_{l+m}}\mathrm{sgn}(\hat{\pi})\hat{\pi}\right)\\&=\left(\frac{1}{(l+m+n)!}\sum\limits_{\gamma\in\Gamma}\sum\limits_{\pi\in S_{l+m}}\mathrm{sgn}(\gamma\pi)\gamma\pi\right)\left(\frac{1}{(l+m)!}\sum\limits_{\hat{\pi}\in S_{l+m}}\mathrm{sgn}(\hat{\pi})\hat{\pi}\right)\\&=\left(\frac{1}{(l+m+n)!}\left(\sum\limits_{\gamma\in\Gamma}\mathrm{sgn}(\gamma)\gamma\right)\left(\sum\limits_{\pi\in S_{l+m}}\mathrm{sgn}(\pi)\pi\right)\right)\left(\frac{1}{(l+m)!}\sum\limits_{\hat{\pi}\in S_{l+m}}\mathrm{sgn}(\hat{\pi})\hat{\pi}\right)\\&=\left(\frac{1}{(l+m+n)!}\sum\limits_{\gamma\in\Gamma}\mathrm{sgn}(\gamma)\gamma\right)\left(\frac{1}{(l+m)!}\sum\limits_{\pi\in S_{l+m}}\sum\limits_{\hat{\pi}\in S_{l+m}}\mathrm{sgn}(\pi)\mathrm{sgn}(\hat{\pi})\pi\hat{\pi}\right)\\&=\frac{1}{(l+m+n)!}\left(\sum\limits_{\gamma\in\Gamma}\mathrm{sgn}(\gamma)\gamma\right)\left(\sum\limits_{\pi\in S_{l+m}}\mathrm{sgn}(\pi)\pi\right)\\&=\frac{1}{(l+m+n)!}\sum\limits_{\pi\in S_{l+m+n}}\mathrm{sgn}(\pi)\pi\end{aligned}

Where throughout we’ve used the fact that \mathrm{sgn} is a representation, and so the signum of the product of two group elements is the product of their signa. We also make the crucial combination of the double sum over S_{l+m} into a single sum by noting that each group element shows up exactly (l+m)! times, and each time it shows up with the exact same sign, which lets us factor out (l+m)! from the sum and cancel the normalizing factor.

Now this multiplication is not commutative. Instead, it’s graded-commutative. If \mu\in\Lambda^m(V) and \nu\in\Lambda^n(V) are elements of the exterior algebra, then we find


That is, elements of odd degree anticommute with each other, while elements of even degree commute with everything.

Indeed, given \mu\in S^m(V) and \nu\in S^n(V), we can let \tau_{m,n} be the permutation which moves the last n slots to the beginning of the term and the first m slots to the end. We can construct \tau_{m,n} by moving each of the last n slots one-by-one past the first m, taking m swaps for each one. That gives a total of mn swaps, so \mathrm{sgn}(\tau_{m,n})=(-1)^{mn}. Then we write

\displaystyle\begin{aligned}\mu\wedge\nu&=\left(\frac{1}{(m+n)!}\sum\limits_{\pi\in S_{m+n}}\mathrm{sgn}(\pi)\pi\right)(\mu\otimes\nu)\\&=\left(\frac{1}{(m+n)!}\sum\limits_{\pi\in S_{m+n}}\mathrm{sgn}(\pi\tau_{m,n})\pi\tau_{m,n}\right)(\mu\otimes\nu)\\&=\mathrm{sgn}(\tau_{m,n})\left(\frac{1}{(m+n)!}\sum\limits_{\pi\in S_{m+n}}\mathrm{sgn}(\pi)\pi\right)\left(\tau_{m,n}(\mu\otimes\nu)\right)\\&=(-1)^{mn}\left(\frac{1}{(m+n)!}\sum\limits_{\pi\in S_{m+n}}\mathrm{sgn}(\pi)\pi\right)(\nu\otimes\mu)\\&=(-1)^{mn}\nu\wedge\mu\end{aligned}

as asserted.

The dual to the exterior algebra \Lambda(V^*) is the algebra of all alternating multilinear functionals on V, providing a counterpart to the algebra of polynomial functions on V. But where the variables in polynomial functions commute with each other, the basic covectors — analogous to variables reading off components of a vector — anticommute with each other in this algebra.

October 27, 2009 Posted by | Algebra, Linear Algebra | 6 Comments

Tensor and Symmetric Algebras

There are a few graded algebras we can construct with our symmetric and antisymmetric tensors, and at least one of them will be useful. Remember that we also have symmetric and alternating multilinear functionals in play, so the same constructions will give rise to even more algebras.

First and easiest we have the tensor algebra on V. This just takes all the tensor powers of V and direct sums them up

\displaystyle T(V)=\bigoplus\limits_{n=0}^\infty V^{\otimes n}

This gives us a big vector space — an infinite-dimensional one, in fact — but it’s not an algebra until we define a bilinear multiplication. For this one, we’ll just define the multiplication by the tensor product itself. That is, if \mu\in V^m and \nu\in V^n are two tensors, their product will be \mu\otimes\nu\in V^{m+n}, which is by definition bilinear. This algebra has an obvious grading by the number of tensorands.

This is exactly the free algebra on a vector space, and it’s just like we built the free ring on an abelian group. If we perform the construction on the dual space V^* we get an algebra of functions. If V has dimension d, then this is isomorphic to the algebra T(V^*)\cong\mathbb{F}\{X^1,\dots,X^d\} of noncommutative polynomials in d variables.

Next we consider the symmetric algebra on V, which consists of the direct sum of all the spaces of symmetric tensors

\displaystyle S(V)=\bigoplus\limits_{n=0}^\infty S^n(V)

with a grading again given by the number of tensorands.

Now, despite the fact that each S^n(V) is a subspace of the tensor space T^{\otimes n}, this is not a subalgebra of T(V). This is because the tensor product of two symmetric tensors may well not be symmetric itself. Instead, we will take the tensor product of \mu\in S^m(V) and \nu\in S^n(V), and then symmetrize it, to give \mu\odot\nu\in S^{m+n}(V). This will be bilinear, and it will work with our choice of grading, but will it be associative?

If we have three symmetric tensors \lambda\in S^l(V), \mu\in S^m(V), and \nu\in S^n(V), then we could multiply them by (\lambda\odot\mu)\odot\nu or by \lambda\odot(\mu\odot\nu). To get the first of these, we tensor \lambda and \mu, symmetrize the result, then tensor with \nu and symmetrize that. But since symmetrizing \lambda\otimes\mu consists of adding up a number of shuffled versions of this tensor, we could tensor with \nu first and then symmetrize only the first l+m tensorands, before finally tensoring the entire thing. I assert that these two symmetrizations — the first one on only part of the whole term — are equivalent to simply symmetrizing the whole thing. Similarly, symmetrizing the last m+n tensorands followed by symmetrizing the whole thing is equivalent to just symmetrizing the whole thing. And so both orders of multiplication are the same, and the operation \odot indeed defines an associative multiplication.

To see this, remember that symmetrizing the whole term involves a sum over the symmetric group S_{l+m+n}, while symmetrizing over the beginning involves a sum over the subgroup S_{l+m}\subseteq S_{l+m+n} consisting of those permutations acting on only the first l+m places. This will be key to our proof. We consider the collection of left cosets of S_{l+m} within S_{l+m+n}. For each one, we can pick a representative element (this is no trouble since there are only a finite number of cosets with a finite number of elements each) and collect these representatives into a set C. Then the whole group S_{l+m+n} is the disjoint union

\displaystyle S_{l+m+n}=\biguplus\limits_{\gamma\in\Gamma}\gamma S_{l+m}

This will let us rewrite the symmetrizer in such a way as to make our point. So let’s write down the product of the two group algebra elements we’re interested in

\displaystyle\begin{aligned}\left(\frac{1}{(l+m+n)!}\sum\limits_{\pi\in S_{l+m+n}}\pi\right)\left(\frac{1}{(l+m)!}\sum\limits_{\hat{\pi}\in S_{l+m}}\hat{\pi}\right)&=\left(\frac{1}{(l+m+n)!}\sum\limits_{\gamma\in\Gamma}\sum\limits_{\pi\in\gamma S_{l+m}}\pi\right)\left(\frac{1}{(l+m)!}\sum\limits_{\hat{\pi}\in S_{l+m}}\hat{\pi}\right)\\&=\left(\frac{1}{(l+m+n)!}\sum\limits_{\gamma\in\Gamma}\sum\limits_{\pi\in S_{l+m}}\gamma\pi\right)\left(\frac{1}{(l+m)!}\sum\limits_{\hat{\pi}\in S_{l+m}}\hat{\pi}\right)\\&=\left(\frac{1}{(l+m+n)!}\left(\sum\limits_{\gamma\in\Gamma}\gamma\right)\left(\sum\limits_{\pi\in S_{l+m}}\pi\right)\right)\left(\frac{1}{(l+m)!}\sum\limits_{\hat{\pi}\in S_{l+m}}\hat{\pi}\right)\\&=\left(\frac{1}{(l+m+n)!}\sum\limits_{\gamma\in\Gamma}\gamma\right)\left(\frac{1}{(l+m)!}\sum\limits_{\pi\in S_{l+m}}\sum\limits_{\hat{\pi}\in S_{l+m}}\pi\hat{\pi}\right)\\&=\frac{1}{(l+m+n)!}\left(\sum\limits_{\gamma\in\Gamma}\gamma\right)\left(\sum\limits_{\pi\in S_{l+m}}\pi\right)\\&=\frac{1}{(l+m+n)!}\sum\limits_{\pi\in S_{l+m+n}}\pi\end{aligned}

Essentially, because the symmetrization of the whole term subsumes symmetrization of the first l+m tensorands, the smaller symmetrization can be folded in, and the resulting sum counts the whole sum exactly (l+m)! times, which cancels out the normalization factor. And this proves that the multiplication is, indeed, associative.

This multiplication is also commutative. Indeed, given \mu\in S^m(V) and \nu\in S^n(V), we can let \tau_{m,n} be the permutation which moves the last n slots to the beginning of the term and the first m slots to the end. Then we write

\displaystyle\begin{aligned}\mu\odot\nu&=\left(\frac{1}{(m+n)!}\sum\limits_{\pi\in S_{m+n}}\pi\right)(\mu\otimes\nu)\\&=\left(\frac{1}{(m+n)!}\sum\limits_{\pi\in S_{m+n}}\pi\tau_{m,n}\right)(\mu\otimes\nu)\\&=\left(\frac{1}{(m+n)!}\sum\limits_{\pi\in S_{m+n}}\pi\right)\left(\tau_{m,n}(\mu\otimes\nu)\right)\\&=\left(\frac{1}{(m+n)!}\sum\limits_{\pi\in S_{m+n}}\pi\right)(\nu\otimes\mu)\\&=\nu\odot\mu\end{aligned}

because right-multiplication by \tau_{m,n} just shuffles around the order of the sum.

The symmetric algebra S(V) is the free commutative algebra on the vector space V. And so it should be no surprise that the symmetric algebra on the dual space is isomorphic to the algebra of polynomial functions on V, where the grading is the total degree of a monomial. If V has finite dimension d, we have S(V^*)\cong\mathbb{F}[X^1,\dots,X^d].

October 26, 2009 Posted by | Algebra, Linear Algebra | 7 Comments

Graded Objects

We’re about to talk about certain kinds of algebras that have the added structure of a “grading”. It’s not horribly important at the moment , but we might as well talk about it now so we don’t forget later.

Given a monoid G, a G-graded algebra is one that, as a vector space, we can write as a direct sum

\displaystyle A=\bigoplus\limits_{g\in G}A_g

so that the product of elements contained in two grades lands in the grade given by their product in the monoid. That is, we can write the algebra multiplication by

\displaystyle\mu:A_g\otimes A_h\rightarrow A_{gh}

for each pair of grades g and h. As usual, we handle elements that are the sum of two elements with different grades by linearity.

By far the most common grading is by the natural numbers under addition, in which case we often just say “graded”. For example, the algebra of polynomials is graded, where the grading is given by the total degree. That is, if A=R[X_1,\dots,X_k] is the algebra of polynomials in k variables, then the n grade consists of sums of products of n of the variables at a time. This is a grading because the product of two such homogeneous polynomials is itself homogeneous, and the total degree of each term in the product is the sum of the degrees of the factors. For instance, the product of xy+yz in grade 2 and x^3+xyz+yz^2 in grade 3 is

\displaystyle (xy+yz)(x^3+xyz+yz^2)=x^4y+x^3yz+x^2y^2z+2xy^2z^2+y^2z^3

in grade 5=2+3.

Other common gradings include \mathbb{Z}-grading and \mathbb{Z}_2-grading. The latter algebras are often called “superalgebras”, related to their use in studying supersymmetry in physics. “Superalgebra” sounds a lot more big and impressive than “\mathbb{Z}_2-graded algebra”, and physicists like that sort of thing.

In the context of graded algebras we also have graded modules. A G-graded module M over the G-graded algebra A can also be written down as a direct sum

\displaystyle M=\bigoplus\limits_{g\in G}M_g

But now it’s the action of A on M that involves the grading:

\displaystyle\alpha:A_g\otimes M_h\rightarrow M_{gh}

We can even talk about grading in the absence of a multiplicative structure, like a graded vector space. Now we don’t even really need the grades to form a monoid. Indeed, for any index set I we might have the graded vector space

\displaystyle V=\bigoplus\limits_{i\in I}V_i

This doesn’t seem to be very useful, but it can serve to recognize natural direct summands in a vector space and keep track of them. For instance, we may want to consider a linear map T between graded vector spaces V and W that only acts on one grade of V and with an image contained in only one grade of W:

\displaystyle\begin{aligned}T(V_i)&\subseteq W_j\\T(V_k)&=0\qquad k\neq i\end{aligned}

We’ll say that such a map is graded (i,j). Any linear map from V to W can be decomposed uniquely into such graded components

\displaystyle\hom(V,W)=\bigoplus\limits_{(i,j)\in I\otimes J}\hom(V_i,W_j)

giving a grading on the space of linear maps.

October 23, 2009 Posted by | Algebra, Linear Algebra, Ring theory | 15 Comments

Multilinear Functionals

Okay, time for a diversion from all this calculus. Don’t worry, there’s tons more ahead.

We’re going to need some geometric concepts tied to linear algebra, and before we get into that we need to revisit an old topic: tensor powers and the subspaces of symmetric and antisymmetric tensors. Specifically, how do all of these interact with duals. Through these post we’ll be working with a vector space V over a field \mathbb{F}, which at times will be assumed to be finite-dimensional, but will not always be.

First, we remember that elements of the dual space V^*=\hom_\mathbb{F}(V,\mathbb{F}) are called “linear functionals”. These are \mathbb{F}-linear functions from the vector space V to the base field \mathbb{F}. Similarly, a “n-multilinear functional” is a function f that takes n vectors from V and gives back a field element in \mathbb{F} in a way that’s \mathbb{F}-linear in each variable. That is,

\displaystyle f(v_1,\dots,av_i+bw_i,\dots,v_n)=af(v_1,\dots,v_i,\dots,v_n)+bf(v_1,\dots,w_i,\dots,v_n)

for scalars a and b, and for any index i. Equivalently, by the defining universal property of tensor products, this is equivalent to a linear function f:V^{\otimes n}\rightarrow\mathbb{F} — a linear functional on V^{\otimes n}. That is, the space of n-multilinear functionals is the dual space \left(V^{\otimes n}\right)^*.

There’s a good way to come up with n-multilinear functionals. Just take n linear functionals and sew them together. That is, if we have an n-tuple of functionals (\lambda^1,\dots,\lambda^n)\in\left(V^*\right)^{\times n} we can define an n-multilinear functional by the formula

\displaystyle\left[m(\lambda^1,\dots,\lambda^n)\right](v_1\otimes\dots\otimes v_n)=\prod\limits_{i=1}^n\lambda^i(v_i)

We just feed the ith tensorand v_i into the ith functional \lambda^i and multiply all the resulting field elements together. Since field multiplication is multilinear, so is this operation. Then the universal property of tensor products tells us that this mapping from n-tuples of linear functionals to n-multilinear functionals is equivalent to a unique linear map from the nth tensor power \left(V^*\right)^{\otimes n}\rightarrow\left(V^{\otimes n}\right)^*. It’s also easy to show that this map has a trivial kernel.

This is not to say that dualization and tensor powers commute. Indeed, in general this map is a proper monomorphism. But it turns out that if V is finite-dimensional, then it’s actually an isomorphism. Just count the dimensions — if V has dimension d then each space has dimension d^n — and use the rank-nullity theorem to see that they must be isomorphic. That is, every n-multilinear functional is a linear combination of the ones we can construct from n-tuples of linear functionals.

Now we can specialize this result. We define a multilinear functional to be symmetric if its value is unchanged when we swap two of its inputs. Equivalently, it commutes with the symmetrizer. That is, it must kill everything that the symmetrizer kills, and so must really define a linear functional on the subspace of symmetric tensors. That is, the space of symmetric n-multilinear functionals is the dual space \left(S^nV\right)^*. We can construct such symmetric multilinear functionals by taking n linear functionals as before and symmetrizing them. This gives a monomorphism S^n\left(V^*\right)\rightarrow\left(S^nV\right)^*, which is an isomorphism if V is finite-dimensional.

Similarly, we define a multilinear functional to be asymmetric or “alternating” if its value changes sign when we swap two of its inputs. Then it commutes with the antisymmetrizer, must kill everything the antisymmetrizer kills, and descends to a linear functional on the subspace of antisymmetric tensors. As before, we can construct just such an antisymmetric n-multilinear functional by antisymmetrizing n linear functionals, and get a monomorphism A^n\left(V^*\right)\rightarrow\left(A^nV\right)^*. And yet again, this map is an isomorphism if V is finite-dimensional.

October 22, 2009 Posted by | Algebra, Linear Algebra | 8 Comments

How Many Generators?

Okay, one last post to fill out the week.

The shears alone generate the special linear group. Can we strip them down any further? And, with this in mind, how many generators does it take to build up the whole general linear group?

It turns out that we don’t even need all the shears. We can just use neighboring shears to build all the others. Indeed:


In terms of elementary row operations, first we add the third row to the second. Then we add the second to the first, effectively adding the third row to the first as well. Then we subtract the third row from the second, undoing that first step. Finally, we subtract the second row (alone now) from the first, undoing the extra addition of the second row to the first. At the end of the whole process we’ve added the third row to the first. We could modify this by adding a multiple of the third row to the second, and subtracting the same multiple later. Check to see what result that has. And we have similar results using neighboring lower shears

So we can generate the special linear group \mathrm{SL}(n,\mathbb{F}) using only the 2n-2 neighboring shears. If we have an matrix M in \mathrm{GL}(n\mathbb{F}) we can take its determinant \det(M). Then we can write M=C_{1,\det(M)}\tilde{M}. Here we’ve factored out a scaling by the determinant in the first row and we’re left with a matrix \tilde{M} in \mathrm{SL}(n,\mathbb{F}), which can then be written in terms of neighboring shears. So we need 2n-1 (families of) generators here.

These are the best I can do, and I don’t see a way of improving. Roughly, upper shears can’t build up lower shears, any collection of neighboring shears can only affect the rows they cover in sequence and so can’t build up a new neighboring shear, and no shears can handle that one scaling. So it seems there’s no way to pare down this collection of generators. But there might be a completely different approach that leads to fewer families of generators. If someone has one, I’d be glad to see it.

September 11, 2009 Posted by | Algebra, Linear Algebra | 2 Comments

Shears Generate the Special Linear Group

We established that if we restrict to upper shears we can generate all upper-unipotent matrices. On the other hand if we use all shears and scalings we can generate any invertible matrix we want (since swaps can be built from shears and scalings). We clearly can’t build any matrix whatsoever from shears alone, since every shear has determinant {1} and so must any product of shears. But it turns out that we can use shears to generate any matrix of determinant {1} — those in the special linear group.

First of all, let’s consider the following matrix equations, which should be easy to verify


These show that we can always pull a scaling to the left past a shear. In the first two cases, the scaling and the shear commute if the row and column the scaling acts on are uninvolved in the shear. In the last two cases, we have to modify the shear in the process, but we end up with the scaling written to the left of a shear instead of to the right. We can use these toy examples to see that we can always pull a scaling from the right to the left of a shear, possibly changing the shear in the process.

What does this mean? When we take a matrix and write it out in terms of elementary matrices, we can always modify this expression so that all the scalings are to the left of all the shears. Then we have a diagonal matrix to the left of a long product of shears, since the product of a bunch of scalings is a diagonal matrix. But now the determinant of each shear is {1}, and the determinant of the diagonal matrix must be the product of the diagonal entries, which are the scaling factors. And so the product of the scaling factors is the determinant of our original matrix.

We’re specifically concerned with matrices of determinant {1}, meaning the product of all the diagonal entries must come out to be {1}. I’m going to use this fact to write the diagonal matrix as a product of scalings in a very particular way. Let’s say the diagonal entry in row n is \lambda_n. Then I’m going to start by writing down

\displaystyle C_{1,\lambda_1}C_{2,\lambda_1^{-1}}

I’ve scaled the first row by the right amount, and then scaled the second row by the inverse amount so the product of the two scaling factors is {1}. Then I write down

\displaystyle C_{2,\lambda_1\lambda_2}C_{3,\lambda_1^{-1}\lambda_2^{-1}}

The product of the two scalings of the second row ends up scaling it by \lambda_2, and we scale the third row to compensate. We continue this way, scaling each row to the right amount, and the next one by the inverse factor. Once we scale the next-to-last row we’re done, since the scaling factor for the last row must be exactly what we need to make the total product of all the scaling factors come out to {1}. That is, as long as the total scaling factor is {1}, we can write the diagonal matrix as the product of these pairs of scalings with inverse scaling factors.

Now let’s take four shears, alternating upper and lower, since two upper shears in a row are the same as a single upper shear, and similarly for lower shears. We want it to come out to one of these pairs of scalings.


This gives us four equations to solve


These quickly simplify to


Which can be solved to find


So we could pick d=-x and for any scaling factor x write


And so we can write such a pair of scalings with inverse scaling factors as a product of four shears. Since in the case at hand we can write the diagonal part of our elementary matrix decomposition with such pairs of scalings, we can translate them all into shears. And at the end of the day, we can write any special linear transformation as a product of a bunch of shears.

September 9, 2009 Posted by | Algebra, Linear Algebra | 3 Comments

The Special Linear Group (and others)

We’ve got down the notion of the general linear group \mathrm{GL}(V) of a vector space V, including the particular case of the matrix group \mathrm{GL}(n,\mathbb{F}) of the space \mathbb{F}^n. We also have defined the orthogonal group \mathbb{O}(n,\mathbb{F}) of n\times n matrices over \mathbb{F} whose transpose and inverse are the same, which is related to the orthogonal group \mathrm{O}(V,B) of orthogonal transformations of the real vector space V preserving a specified bilinear form B. Lastly, we’ve defined the group \mathrm{U}(n) of unitary transformations on \mathbb{C}^nn\times n complex matrices whose conjugate transpose and inverse are the same.

For all of these matrix groups — which are all subgroups of some appropriate \mathrm{GL}(n,\mathbb{F}) — we have a homomorphism to the multiplicative group of \mathbb{F} given by the determinant. We originally defined the determinant on \mathrm{GL}(n\mathbb{F}) itself, but we can easily restrict it to any subgroup. We actually know that for unitary and orthogonal transformations the image of this homomorphism must lie in a particular subgroup of \mathbb{F}^\times. But in any case, the homomorphism must have a kernel, and this kernel turns out to be important.

In the case of the general linear group \mathrm{GL}(V), the kernel of the determinant homomorphism consists of the automorphisms of V with determinant {1}. We call this subgroup of \mathrm{GL}(V) the “special linear group” \mathrm{SL}(V), and transformations in this subgroup are sometimes called “special linear transformations”. Of course, we also have the particular special linear group \mathrm{SL}(n,\mathbb{F})\subseteq\mathrm{GL}(n,\mathbb{F}). When we take the kernel of any of the other groups, we prepend the adjective “special” and an \mathrm{S} to the notation. Thus we have the special orthogonal groups \mathrm{SO}(V,B) and \mathrm{SO}(n,\mathbb{F}) and the special unitary group \mathrm{SU}(n).

In a sense, all the interesting part of the general linear group is contained in the special linear subgroup. Outside of that, what remains is “just” a scaling. It’s a little more complicated than it seems on the surface, but not much.

September 8, 2009 Posted by | Algebra, Linear Algebra | 6 Comments