Schur’s Lemma
Now that we know that images and kernels of –morphisms between
–modules are
-modules as well, we can bring in a very general result.
Remember that we call a -module irreducible or “simple” if it has no nontrivial submodules. In general, an object in any category is simple if it has no nontrivial subobjects. If a morphism in a category has a kernel and an image — as we’ve seen all
-morphisms do — then these are subobjects of the source and target objects.
So now we have everything we need to state and prove Schur’s lemma. Working in a category where every morphism has both a kernel and an image, if is a morphism between two simple objects, then either
is an isomorphism or it’s the zero morphism from
to
. Indeed, since
is simple it has no nontrivial subobjects. The kernel of
is a subobject of
, so it must either be
itself, or the zero object. Similarly, the image of
must either be
itself or the zero object. If either
or
then
is the zero morphism. On the other hand, if
and
we have an isomorphism.
To see how this works in the case of -modules, every time I say “object” in the preceding paragraph replace it by “
-module”. Morphisms are
-morphisms, the zero morphism is the linear map sending every vector to
, and the zero object is the trivial vector space
. If it feels more comfortable, walk through the preceding proof making the required substitutions to see how it works for
-modules.
In terms of matrix representations, let’s say and
are two irreducible matrix representations of
, and let
be any matrix so that
for all
. Then Schur’s lemma tells us that either
is invertible — it’s the matrix of an isomorphism — or it’s the zero matrix.
Images and Kernels
A nice quick one today. Let’s take two –modules
and
. We’ll write
for the vector space of intertwinors from
to
. This is pretty appropriate because these are the morphisms in the category of
-modules. It turns out that this category has kernels and has images. Those two references are pretty technical, so we’ll talk in more down-to-earth terms.
Any intertwinor is first and foremost a linear map
. And as usual the kernel of
is the subspace
of vectors
for which
. I say that this isn’t just a subspace of
, but it’s a submodule as well. That is,
is an invariant subspace of
. Indeed, we check that if
and
is any element of
, then
, so
as well.
Similarly, as usual the image of is the subspace
of vectors
for which there’s some
with
. And again I say that this is an invariant subspace. Indeed, if
and
is any element of
, then
as well.
Thus these images and kernels are not just subspaces of the vector spaces and
, but submodules to boot. That is, they can act as images and kernels in the category of
-modules just like they do in the category of complex vector spaces.
Maschke’s Theorem
Maschke’s theorem is a fundamental result that will make our project of understanding the representation theory of finite groups — and of symmetric groups in particular — far simpler. It tells us that every representation of a finite group is completely reducible.
We saw last time that in the presence of an invariant form, any reducible representation is decomposable, and so any representation with an invariant form is completely reducible. Maschke’s theorem works by showing that there is always an invariant form!
Let’s start by picking any form whatsoever. We know that we can do this by picking a basis of
and declaring it to be orthonormal. We don’t anything fancy like Gram-Schmidt, which is used to find orthonormal bases for a given inner product. No, we just define our inner product by saying that
— the Kronecker delta, with value
when its indices are the same and
otherwise — and extend the only way we can. If we have
and
then we find
so this does uniquely define an inner product. But there’s no reason at all to believe it’s -invariant.
We will use this arbitrary form to build an invariant form by a process of averaging. For any vectors and
, define
Showing that this satisfies the definition of an inner product is a straightforward exercise. As for invariance, we want to show that for any we have
. Indeed:
where the essential second equality follows because as ranges over
, the product
ranges over
as well, just in a different order.
And so we conclude that if is a representation of
then we can take any inner product whatsoever on
and “average” it to obtain an invariant form. Then with this invariant form in hand, we know that
is completely reducible.
Why doesn’t this work for our counterexample representation of ? Because the group
is infinite, and so the averaging process breaks down. This approach only works for finite groups, where the average over all
only involves a finite sum.
Invariant Forms
A very useful structure to have on a complex vector space carrying a representation
of a group
is an “invariant form”. To start with, this is a complex inner product
, which we recall means that it is
- linear in the second slot —
- conjugate symmetric —
- positive definite —
for all
Again as usual these imply conjugate linearity in the first slot, so the form isn’t quite bilinear. Still, people are often sloppy and say “invariant bilinear form”.
Anyhow, now we add a new condition to the form. We demand that it be
- invariant under the action of
—
Here I have started to write as shorthand for
. We will only do this when the representation in question is clear from the context.
The inner product gives us a notion of length and angle. Invariance now tells us that these notions are unaffected by the action of . That is, the vectors
and
have the same length for all
and
. Similarly, the angle between vectors
and
is exactly the same as the angle between
and
. Another way to say this is that if the form
is invariant for the representation
, then the image of
is actually contained in the
orthogonal group [commenter Eric Finster, below, reminds me that since we’ve got a complex inner product we’re using the group of unitary transformations with respect to the inner product :
].
More important than any particular invariant form is this: if we have an invariant form on our space , then any reducible representation is decomposable. That is, if
is a submodule, we can find another submodule
so that
as
-modules.
If we just consider them as vector spaces, we already know this: the orthogonal complement is exactly the subspace we need, for
. I say that if
is a
-invariant subspace of
, then
is as well, and so they are both submodules. Indeed, if
, then we check that
is as well:
where the first equality follows from the -invariance of our form; the second from the representation property; and the third from the fact that
is an invariant subspace, so
.
So in the presence of an invariant form, all finite-dimensional representations are “completely reducible”. That is, they can be decomposed as the direct sum of a number of irreducible submodules. If the representation is irreducible to begin with, we’re done. If not, it must have some submodule
. Then the orthogonal complement
is also a submodule, and we can write
. Then we can treat both
and
the same way. The process must eventually bottom out, since each of
and
have dimension smaller than that of
, which was finite to begin with. Each step brings the dimension down further and further, and it must stop by the time it reaches
.
This tells us, for instance, that there can be no inner product on that is invariant under the representation of the group of integers
we laid out at the end of last time. Indeed, that was an example of a reducible representation that is not decomposable, but if there were an invariant form it would have to decompose.
Decomposability
Today I’d like to cover a stronger condition than reducibility: decomposability. We say that a module is “decomposable” if we can write it as the direct sum of two nontrivial submodules
and
. The direct sum gives us inclusion morphisms from
and
into
, and so any decomposable module is reducible.
What does this look like in terms of matrices? Well, saying that means that we can write any vector
uniquely as a sum
with
and
. Then if we have a basis
of
and a basis
of
, then we can write
and
uniquely in terms of these basis vectors. Thus we can write any vector
uniquely in terms of the
, and so these constitute a basis of
.
If we write the matrices in terms of this basis, we find that the image of any
can be written in terms of the others because
is
-invariant. Similarly, the
-invariance of
tells us that the image of each
can be written in terms of the others. The same reasoning as last time now allows us to conclude that the matrices of the
all have the form
Conversely, if we can write each of the in this form, then this gives us a decomposition of
as the direct sum of two
-invariant subspaces, and the representation is decomposable.
Now, I said above that decomposability is stronger than reducibility. Indeed, in general there do exist modules which are reducible, but not decomposable. Indeed, in categorical terms this is the statement that for some groups there are short exact sequences which do not split. To chase this down a little further, our work yesterday showed that even in the reducible case we have the equation
. This
is the representation of
on the quotient space, which gives our short exact sequence
But in general this sequence may not split; we may not be able to write as
-modules. Indeed, we’ve seen that the representation of the group of integers
is indecomposable.
Reducibility
We say that a module is “reducible” if it contains a nontrivial submodule. Thus our examples last time show that the left regular representation is always reducible, since it always contains a copy of the trivial representation as a nontrivial submodule. Notice that we have to be careful about what we mean by each use of “trivial” here.
If the -dimensional representation
has a nontrivial
-dimensional submodule
—
and
— then we can pick a basis
of
. And then we know that we can extend this to a basis for all of
:
.
Now since is a
-invariant subspace of
, we find that for any vector
and
the image
is again a vector in
, and can be written out in terms of the
basis vectors. In particular, we find
, and all the coefficients of
through
are zero. That is, the matrix of
has the following form:
where is an
matrix,
is an
matrix, and
is an
matrix. And, in fact, this same form holds for all
. In fact, we can use the rule for block-multiplying matrices to find:
and we see that actually provides us with the matrix for the representation we get when restricting
to the submodule
. This shows us that the converse is also true: if we can find a basis for
so that the matrix
has the above form for every
, then the subspace spanned by the first
basis vectors is
-invariant, and so it gives us a subrepresentation.
As an example, consider the defining representation of
, which is a permutation representation arising from the action of
on the set
. This representation comes with the standard basis
, and it’s easy to see that every permutation leaves the vector
— along with the subspace
that it spans — fixed. Thus
carries a copy of the trivial representation as a submodule of
. We can take the given vector as a basis and throw in two others to get a new basis for
:
.
Now we can take a permutation — say — and calculate its action in terms of the new basis:
The others all work similarly. Then we can write these out as matrices:
Notice that these all have the required form:
Representations that are not reducible — those modules that have no nontrivial submodules — are called “irreducible representations”, or sometimes “irreps” for short. They’re also called “simple” modules, using the general term from category theory for an object with no nontrivial subobjects.
Submodules
Fancy words: a submodule is a subobject in the category of group representations. What this means is that if and
are
-modules, and if we have an injective morphism of
modules
, then we say that
is a “submodule” of
. And, just to be clear, a
-morphism is injective if and only if it’s injective as a linear map; its kernel is zero. We call
the “inclusion map” of the submodule.
In practice, we often identify a -submodule with the image of its inclusion map. We know from general principles that since
is injective, then
is isomorphic to its image, so this isn’t really a big difference. What we can tell, though, is that the action of
sends the image back into itself.
That is, let’s say that is the image of some vector
. I say that for any group element
, acting by
on
gives us some other vector that’s also in the image of
. Indeed, we check that
which is again in the image of , as asserted. We say that the image of
is “
-invariant”.
The flip side of this is that any time we find such a -invariant subspace of
, it gives us a submodule. That is, if
is a
-module, and
is a
-invariant subspace, then we can define a new representation on
by restriction:
. The inclusion map that takes any vector
and considers it as a vector in
clearly intertwines the original action
and the restricted action
, and its kernel is trivial. Thus
constitutes a
-submodule.
As an example, let be any finite group, and let
be its group algebra, which carries the left regular representation
. Now, consider the subspace
spanned by the vector
That is, consists of all vectors for which all the coefficients
are equal. I say that this subspace
is
-invariant. Indeed, we calculate
But this last sum runs through all the elements of , just in a different order. That is,
, and so
carries the one-dimensional trivial representation of
. That is, we’ve found a copy of the trivial representation of
as a submodule of the left regular representation.
As another example, let be one of the symmetric groups. Again, let
carry the left regular representation, but now let
be the one-dimensional space spanned by
It’s a straightforward exercise to show that is a one-dimensional submodule carrying a copy of the signum representation.
Every -module
contains two obvious submodules: the zero subspace
and the entire space
itself are both clearly
-invariant. We call these submodules “trivial”, and all others “nontrivial”.
Morphisms Between Representations
Since every representation of is a
–module, we have an obvious notion of a morphism between them. But let’s be explicit about it.
A -morphism from a
-module
to another
-module
is a linear map
between the vector spaces
and
that commutes with the actions of
. That is, for every
we have
. Even more explicitly, if
and
, then
We can also express this with a commutative diagram:
For each group element our representations give us vertical arrows
and
. The linear map
provides horizontal arrows
. To say that the diagram “commutes” means that if we compose the arrows along the top and right to get a linear map from
to
, and if we compose the arrows along the left and bottom to get another, we’ll find that we actually get the same function. In other words, if we start with a vector
in the upper-left and move it by the arrows around either side of the square to get to a vector in
, we’ll get the same result on each side. We get one of these diagrams — one of these equations — for each
, and they must all commute for
to be a
-morphism.
Another common word that comes up in these contexts is “intertwine”, as in saying that the map “intertwines” the representations
and
, or that it is an “intertwinor” for the representations. This language goes back towards the viewpoint that takes the representing functions
and
to be fundamental, while
-morphism tends to be more associated with the viewpoint emphasizing the representing spaces
and
.
If, as will usually be the case for the time being, we have a presentation of our group by generators and relations, then we’ll only need to check that intertwines the actions of the generators. Indeed, if
intertwines the actions of
and
, then it intertwines the actions of
. We can see this in terms of diagrams by stacking the diagram for
on top of the diagram for
. In terms of equations, we check that
So if we’re given a set of generators and we can write every group element as a finite product of these generators, then as soon as we check that the intertwining equation holds for the generators we know it will hold for all group elements.
There are also deep connections between -morphisms and natural transformations, in the categorical viewpoint. Those who are really interested in that can dig into the archives a bit.
Coset Representations
Next up is a family of interesting representations that are also applicable to any group . The main ingredient is a subgroup
— a subset of the elements of
so that the inverse of any element in
is also in
, and the product of any two elements of
is also in
.
Our next step is to use to break
up into cosets. We consider
and
to be equivalent if
. It’s easy to check that this is actually and equivalence relation (reflexive, symmetric, and transitive), and so it breaks
up into equivalence classes. We write the coset of all
that are equivalent to
as
, and we write the collection of all cosets of
as
.
We should note that we don’t need to worry about being a normal subgroup of
, since we only care about the set of cosets. We aren’t trying to make this set into a group — the quotient group — here.
Okay, now multiplication on the left by shuffles around the cosets. That is, we have a group action of
on the quotient set
, and this gives us a permutation representation of
!
Let’s work out an example to see this a bit more explicitly. For our group, take the symmetric group , and for our subgroup let
. Indeed,
is closed under both inversion and multiplication. And we can break
up into cosets:
where we have picked a “transversal” — one representative of each coset so that we can write them down more simply. It doesn’t matter whether we write or
, since both are really the same set. Now we can write down the multiplication table for the group action. It takes
and
, and tells us which coset
falls in:
This is our group action. Since there are three elements in the set , the permutation representation we get will be three-dimensional. We can write down all the matrices just by working them out from this multiplication table:
It turns out that these matrices are the same as we saw when writing down the defining representation of . There’s a reason for this, which we will examine later.
As special cases, if , then there is one coset for each element of
, and the coset representation is the same as the left regular representation. At the other extreme, if
, then there is only one coset and we get the trivial representation.
The (Left) Regular Representation
Now it comes time to introduce what’s probably the most important representation of any group, the “left regular representation”. This arises because any group acts on itself by left-multiplication. That is, we have a function
— given by
. Indeed, this is an action because first
; and second
, and
as well.
So, as with any group action on a finite set, we get a finite-dimensional permutation representation. The representing space has a standard basis corresponding to the elements of
. That is, to every element
we have a basis vector
. But we can recognize this as the standard basis of the group algebra
. That is, the group algebra itself carries a representation.
Of course, this shouldn’t really surprise us. After all, representations of are equivalent to modules for the group algebra; and the very fact that
is an algebra means that it comes with a bilinear function
, which makes it into a module over itself.
We should note that since this is the left regular representation, there is also such a thing as the right regular representation, which arises from the action of on itself by multiplication on the right. But by itself right-multiplication doesn’t really give an action, because it reverses the order of multiplication. Indeed, for a group action as we’ve defined it first acting by
and then acting by
is the same as acting by the product
. But if we first multiply on the right by
and then by
we get
, which is the same as acting by
. The order has been reversed.
To compensate for this, we define the right regular representation by the function . Then
, and
as well.
As an exercise, let’s work out the matrices of the left regular representation for the cyclic group with respect to its standard basis. We have four elements in this group:
and
. Thus the regular representation will be four-dimensional, and we will index the rows and columns of our matrices by the exponents
,
,
, and
. Then in the matrix
the entry in the
th row and
th column is
. The multiplication rule tells us that
, where the exponent is defined up to a multiple of four, and so the matrix entry is
if
, and
otherwise. That is:
You can check for yourself that these matrices indeed give a representation of the cyclic group.