From the module perspective, we’re led back to the concept of a group action. This is like a -module, but “discrete”. Let’s just write down the axioms for easy reference: we’ve got a set and a function such that
- preserves the identity: .
- preserves the group operation: .
Notice how this looks almost exactly like the axioms for a -module, except since is just a set we don’t have any sense of “linearity”.
Now, from a group action on a finite set we can get a finite-dimensional representation. We let — the vector space defined to have as a basis. That is, vectors in are of the form
for some complex coefficients . We get a -module by extending to a bilinear function . We already know how it behaves on the basis of the form , and the extension to a bilinear function is uniquely defined. We call the “permutation representation” associated to , and the elements for we call the “standard basis”.
As an example, the group is defined from the very beginning by the fact that it acts on the set by shuffling the numbers around. And so we get a representation from this action, which we call the “defining representation”. By definition, it has dimension , since it has a basis given by . To be even more explicit, let me write out the defining matrix representation for . Technically, going from an abstract representation to a matrix representation requires not just a basis, but an ordered basis, but the order should be pretty clear in this case. And so, with no further ado:
To see how this works, note that the permutation sends to . Similarly, we find that . That is:
We also see that composition of permutations turns into matrix multiplication. For example, . In terms of the matrices we calculate:
You can check for yourself all the other cases that you care to.
Notice that in general the matrices are index by two elements of , and the matrix element — the one in the th row and th column — is . That is, it’s if — if the action of sends to — and otherwise. This guarantees that every entry will be either or , and that each row and each column will have exactly one . Such a matrix we call a “permutation matrix”, and we see that the matrices that occur in permutation representations are permutation matrices.
With the group algebra in hand, we now define a “-module” to be a module for the group algebra of . That is, it’s a (finite-dimensional) vector space and a bilinear map . This map must satisfy and .
This is really the same thing as a representation, since we may as well pick a basis for and write . Then for any we can write
That is, is a linear map from to itself, with its matrix entries given by . We define this matrix to be , which must be a representation because of the conditions on above.
Conversely, if we have a matrix representation , we can define a module map for as
where we apply the matrix to the column vector . This must satisfy the above conditions, since they reflect the fact that is a representation.
In fact, to define , all we really need to do is to define it for the basis elements . Then linearity will take care of the rest of the group algebra. That is, we can just as well say that a -module is a vector space and a function satisfying the following three conditions:
- is linear in : .
- preserves the identity: .
- preserves the group operation: .
The difference between the representation viewpoint and the -module viewpoint is that representations emphasize the group elements and their actions, while -modules emphasize the representing space . This viewpoint will be extremely helpful when we want to consider a representation as a thing in and of itself. It’s easier to do this when we think of it as a vector space equipped with the extra structure of a -action.
A useful construction for our purposes is the group algebra . We’ve said a lot about this before, and showed a number of things about it, but most of those we can ignore for now. All that’s really important is that is an algebra whose representations are intimately connected with those of .
When we say it’s an algebra, we just mean that it’s a vector space with a distributive multiplication defined on it. And in our case of a finite group it’s easy to define both. For every group element we have a basis vector . That is, we get every vector in the algebra by picking one complex coefficient for each element , and adding them all up:
Multiplication is exactly what we might expect: the product of two basis vectors and is the basis vector , and we extend everything else by linearity!
We often rearrange this sum to collect all the terms with a given basis vector together:
And we can go from representations of a group to representations of its group algebra. Indeed, if is a representation, then we can define
Indeed, each is a matrix, and we can multiply matrices by complex numbers and add them together, so the right hand side is perfectly well-defined as a matrix in the matrix algebra . It’s a simple matter to verify that preserves addition of vectors, scalar multiples of vectors, and products of vectors.
Conversely, if is a representation, then we can restrict it to our basis vectors and get a map . The image of each basis vector must be invertible, for we have
As promised, we want to see some examples of matrix representations for those who might not have seen much of them before. These are homomorphisms from a group to the group — we will often simply write — of invertible complex matrices. The positive integer is called the “dimension” or “degree” of the representation. The first few of our examples will be one-dimensional, largely because that makes the examples simpler. Indeed, a matrix is simply a complex number, and multiplication of such matrices is just complex multiplication!
First off, every group has the “trivial representation”, which sends each group element to the matrix . It should be clear that the identity of is sent to the identity matrix — indeed, every group element is sent to the identity matrix — and that the image of the product of two group elements is the product of their images. Writing for the trivial representation we find:
Let’s diverge from the symmetric group for a moment and consider the cyclic group . This consists of powers of a single generator with the relation . That is, we have . The definition of a representation tells us that we must send to , but we may have some latitude in choosing where to send the generator . Since it has to go somewhere, let’s set . And now we know what happens to everything else in the group!
in particular, we have , so these must go to the same value. That is,
and we must have . Thus we have such a representation for each of the th roots of unity, and these are all possible one-dimensional representations of .
This illustrates a common technique for finding representations, at least when we have a nice finite presentation of our group: choose an image for each of the generators, and use the relations to give us equations which these images must satisfy. The fact that this works out is deeply wrapped up in the universal properties of free groups, but if that sounds scary you don’t have to worry about it!
We should include at least one example of a higher-degree representation. A nice one is the following representation of the group of integers. Remember that the usual operation on integers is addition, not “multiplication” like we usually say for groups. With that in mind, we set
It’s straightforward to check that the additive identity is sent to the identity matrix, and that the group operation is preserved:
For some more review, let’s recall the idea of conjugation in a group . We say that two elements and are “conjugate” if for some .
This is an equivalence relation — reflexive, symmetric, and transitive. Any element is conjugate to itself by the group identity; if , then ; if and , then . Thus the set underlying the group can be partitioned into “conjugacy classes”: two group elements are in the same conjugacy class if and only if they are conjugate. The conjugacy class containing a group is commonly written . The different conjugacy classes are pairwise disjoint, and their union is all of .
First of all, when we write a permutation in cycle notation it’s easy to see what conjugation does to it. Indeed, if is a -cycle and if is any other permutation, then we can write down the conjugate:
And the same goes for any other permutation written in cycle notation: the conjugate by is given by applying to each symbol in the cycle notation. In particular, any two conjugate permutations have the same cycle type. In fact, the converse is also true: given any two permutations with the same cycle type, we can find a permutation that sends the one to the other.
For example, consider the permutations and . Stack them on top of each other:
turn this into two-line notation
and we’ve got a permutation which sends to :
which is equivalent to . That is, two permutations are conjugate if and only if they have the same cycle type. This is really big. Given a cycle type , we will write the corresponding conjugacy class as .
We also know from some general facts about group actions that the number of elements in the conjugacy class is equal to the number of cosets of the “centralizer” . We recall that the centralizer of is the collection of group elements that commute with . That is:
and we have the equation
In the case of — and armed with our formula for conjugating permutations in cycle notation — we can use this to count the size of each . We know that , so all we need is to find out how many permutations leave a permutation (with cycle type ) the same when we conjugate it to get . We will write this number as , and similarly we will write .
So, how can we change the cycle notation of and get something equivalent back again? Well, for any -cycle in we can rotate it around in different ways. For instance, is the same as is the same as . We can also shuffle around any cycles that have the same length: is the same as .
Thus if has cycle type , then we can shuffle the -cycles ways; we can shuffle the -cycles ways; and so on until we can shuffle the -cycles ways. Each -cycle can be rotated into different position for a total of choices; each -cycle can be rotated into different positions for a total of choices; and so on until each -cycle can be rotated into different positions for a total of choices. Therefore we have a total of
different ways of writing the same permutation . And each of these ways corresponds to a permutation in . We have thus calculated
and we conclude
As a special case, how many transpositions are there in the group ? Recall that a transposition is a permutation of the form , which has cycle type . Our formula tells us that there are
permutations in this conjugacy class, as we could expect.
One concept I’d like to introduce is that of the “cycle type” of a permutation. This simply counts the number of cycles of various length in a permutation. For example — using the sample permutations from last time — has two cycles of length two, while has one cycle of length three and one “fixed point” — a cycle of length one. No cycle in a permutation in can be longer than , so we only need to count up to . We collect the information in a cycle type into a tuple of the form . The cycle type of is , and that of is .
It should be clear that the sum of the cycle lengths is . In a formula:
That is, the cycle type breaks up or “partitions” into chunks whose total size adds up to . In general, a partition of is a sequence of numbers in nonincreasing order, whose sum is . Thus the cycle type of gives us the partition , while the cycle type of gives us the partition .
One more example, from the beginning: the two-line notation
describing a permutation in has the cycle notation . Its cycle type is , which corresponds to the partition .
Before we push on into our new topic, let’s look back at some of the background that we’ve already covered.
We’re talking about symmetric groups, which are, of course, groups. We have various ways of writing down an element of , including the two-line notation and the cycle notation that are covered in our earlier description of the symmetric groups. As an example, the two-line notation
and the cycle notation both describe the permutation that sends to , back to , and similarly swaps and . Similarly, the two-line notation
the composition of
and the cycle notation or (equivalently) describe the permutation that cycles the elements , , and (in that order) and leaves untouched.
We’re specifically concerned with complex representations of these groups. That is, we want to pick some complex vector space , and for each permutation we want to come up with some linear transformation for which the composition of linear transformations and the composition of permutations are “the same” in the sense that given two permutations and , the transportation corresponding to the composite is equal to the composite of the corresponding transformations .
We’re primarily interested in finite-dimensional representations. That is, ones for which is a finite-dimensional complex vector space. In this case, we know that we can always just assume that — the space of -tuples of complex numbers — and that linear transformations are described by matrices. Composition of transformations is reflected in matrix multiplication. That is, for every permutation we want to come up with an matrix so that the matrix corresponding to the composition of two permutations is the product of the matrices corresponding to the two permutations. I’ll be giving some more explicit examples soon.
Okay, I’m done with measure theory (for now, at least), and not a moment too soon. It’s been good for me to work through all this stuff again, and I hope it’s provided a good resource, but my traffic has really taken a hit, at least as measured by daily page views. I think maybe everybody else hates analysis too?
So let’s go in a completely different direction! I want to talk about the representation theory of permutation groups. Now at least on the surface you might not think there’s a lot to say, but it’s a surprisingly detailed subject. And since every finite group can be embedded in a permutation group — its action on itself by left multiplication permutes its own elements — and many natural symmetries come in the form of permutations, it’s a very useful subject as well.
But as a niche it doesn’t get taught very much. Even the most direct application I know (the representation theory of classical groups) usually avoids getting into the details of this topic. And so it’s likely that many if not most readers haven’t really seen much of it. Along the way we’ll learn a certain amount about the representation theory of more general finite groups, so if you don’t know much about it yet, don’t worry! I’ll try to link back to more general background information where appropriate — and please ask me to fill in points that I seem to skim over — but the coverage should be pretty self-contained. The great thing about this is that you don’t have to have read anything else I’ve written, and you don’t have to be a particularly expert mathematician to follow along! And of course neither do your friends and acquaintances, so this is the perfect chance to pass the word along. Don’t be shy about telling other people to join in!
There’s also going to be a fair amount of focus on the combinatorics involved in the representation theory, and there are some really amazingly beautiful patterns. But we’ll also see some very explicit descriptions of how to actually count these things. If I feel up to it, I may try to actually implement some of the algorithms we see and talk about that over at my new programming weblog The Unapologetic Programmer. Oh yeah: if you haven’t heard yet, I’ve got a new weblog about programming, so all of you into software design should be heading over there and adding it to your RSS reader. And since my work there is a lot more casual, you should tell your other programmer friends about it too!
So enough shameless self-promotion; let’s get down to business.
I really wish I could just say in post titles.
Anyway, I want to investigate the continuous dual of for . That is, we’re excluding the case where either (but not its Hölder conjugate ) is infinite. And I say that when is -finite, the space of bounded linear functionals on is isomorphic to .
First, I’m going to define a linear map . Given a function , let be the linear functional defined for any by
It’s clear from the linearity of multiplication and of the integral itself, that this is a linear functional on . Hölder’s inequality itself shows us that not only does the integral on the right exist, but
That is, is a bounded linear functional, and the operator norm is at most the norm of . The extremal case of Hölder’s inequality shows that there is some for which this is an equality, and thus we conclude that . That is, is an isometry of normed vector spaces. Such a mapping has to be an injection, because if then , which implies that .
Now I say that is also a surjection. That is, any bounded linear functional is of the form for some . Indeed, if then we can just pick as a preimage. Thus we may assume that is a nonzero bounded linear functional on , and . We first deal with the case of a totally finite measure space.
In this case, we define a set function on measurable sets by . It’s straightforward to see that is additive. To prove countable additivity, suppose that is the countable disjoint union of a sequence . If we write for the union of through , we find that
Since is continuous, we conclude that , and thus that is a (signed) measure. It should also be clear that implies , and so . The Radon-Nikodym theorem now tells us that there exists an integrable function so that
Linearity tells us that
for simple functions , and also for every , since each such function is the uniform limit of simple functions. We want to show that .
If , then we must show that is essentially bounded. In this case, we find
for every measurable , from which we conclude that a.e., or else we could find some set on which this inequality was violated. Thus .
For other , we can find a measurable with so that . Setting and defining , we find that on , , and so
We thus find
Applying the monotone convergence theorem as we find that .
Thus in either case we’ve found a so that .
In the -finite case, we can write as the countable disjoint union of sets with . We let be the union of the first of these sets. We note that for every measurable set , so is a linear functional on of norm at most . The finite case above shows us that there are functions on so that
We can define if , and let be the sum of all these . We see that
for every , and since we find that . Then Fatou’s lemma shows us that . Thus the -finite case is true as well.
One case in particular is especially worthy of note: since is Hölder-coonjugate to itself, we find that is isomorphic to its own continuous dual space in the same way that a finite-dimensional inner-product space is isomorphic to its own dual space.
In the context of normed vector spaces we have a topology on our spaces and so it makes sense to ask that maps between them be continuous. In the finite-dimensional case, all linear functions are continuous, so this hasn’t really come up before in our study of linear algebra. But for functional analysis, it becomes much more important.
Now, really we only need to require continuity at one point — the origin, to be specific — because if it’s continuous there then it’ll be continuous everywhere. Indeed, continuity at means that for any there is a so that implies . In particular, if , then this means implies . Clearly if this holds, then the general version also holds.
But it turns out that there’s another equivalent condition. We say that a linear transformation is “bounded” if there is some such that for all . That is, the factor by which stretches the length of a vector is bounded. By linearity, we only really need to check this on the unit sphere , but it’s often just as easy to test it everywhere.
Anyway, I say that a linear transformation is continuous if and only if it’s bounded. Indeed, if is bounded, then we find
so as we let approach — as approaches — the difference between and approaches zero as well. And so is continuous.
Conversely, if is continuous, then it is bounded. Since it’s continuous, we let and find a so that for all vectors with . Thus for all nonzero we find
Thus we can use and conclude that is bounded.
The least such that works in the condition for to be bounded is called the “operator norm” of , which we write as . It’s straightforward to verify that , and that if and only if is the zero operator. It remains to verify the triangle identity.
Let’s say that we have bounded linear transformations and with operator norms and , respectively. We will show that works as a bound for , and thus conclude that . Indeed, we check that
and our assertion follows. In particular, when our base field is itself a normed linear space (like or itself) we can conclude that the “continuous dual space” consisting of bounded linear functionals is a normed linear space using the operator norm on .