# The Unapologetic Mathematician

## Group Actions and Representations

From the module perspective, we’re led back to the concept of a group action. This is like a $G$-module, but “discrete”. Let’s just write down the axioms for easy reference: we’ve got a set $S$ and a function $A:G\times S\to S$ such that

• $A$ preserves the identity: $A(e,s)=s$.
• $A$ preserves the group operation: $A(g,A(h,s))=A(gh,s)$.

Notice how this looks almost exactly like the axioms for a $G$-module, except since $S$ is just a set we don’t have any sense of “linearity”.

Now, from a group action on a finite set $S$ we can get a finite-dimensional representation. We let $V=\mathbb{C}S$ — the vector space defined to have $S$ as a basis. That is, vectors in $\mathbb{C}S$ are of the form

$\displaystyle\sum\limits_{s\in S}c_s\mathbf{s}$

for some complex coefficients $c_s$. We get a $G$-module by extending $A$ to a bilinear function $\mathbb{C}[G]\times\mathbb{C}S\to\mathbb{C}S$. We already know how it behaves on the basis of the form $(\mathbf{g},\mathbf{s})$, and the extension to a bilinear function is uniquely defined. We call $\mathbb{C}S$ the “permutation representation” associated to $S$, and the elements $\mathbf{s}$ for $s\in S$ we call the “standard basis”.

As an example, the group $S_n$ is defined from the very beginning by the fact that it acts on the set $\{1,2,\dots,n\}$ by shuffling the numbers around. And so we get a representation from this action, which we call the “defining representation”. By definition, it has dimension $n$, since it has a basis given by $\{\mathbf{1},\dots,\mathbf{n}\}$. To be even more explicit, let me write out the defining matrix representation for $S_3$. Technically, going from an abstract representation to a matrix representation requires not just a basis, but an ordered basis, but the order should be pretty clear in this case. And so, with no further ado:

\displaystyle\begin{aligned}\rho\left(e\right)&=\begin{pmatrix}1&0&0\\{0}&1&0\\{0}&0&1\end{pmatrix}\\\rho\left((1\,2)\right)&=\begin{pmatrix}{0}&1&0\\1&0&0\\{0}&0&1\end{pmatrix}\\\rho\left((1\,3)\right)&=\begin{pmatrix}{0}&0&1\\{0}&1&0\\1&0&0\end{pmatrix}\\\rho\left((2\,3)\right)&=\begin{pmatrix}1&0&0\\{0}&0&1\\{0}&1&0\end{pmatrix}\\\rho\left((1\,2\,3)\right)&=\begin{pmatrix}{0}&0&1\\1&0&0\\{0}&1&0\end{pmatrix}\\\rho\left((1\,3\,2)\right)&=\begin{pmatrix}{0}&1&0\\{0}&0&1\\1&0&0\end{pmatrix}\end{aligned}

To see how this works, note that the permutation $(1\,2\,3)$ sends $3$ to $1$. Similarly, we find that $\left[\rho\left((1\,2\,3)\right)\right](\mathbf{3})=\mathbf{1}$. That is:

$\displaystyle\begin{pmatrix}{0}&0&1\\1&0&0\\{0}&1&0\end{pmatrix}\begin{pmatrix}{0}\\{0}\\1\end{pmatrix}=\begin{pmatrix}1\\{0}\\{0}\end{pmatrix}$

We also see that composition of permutations turns into matrix multiplication. For example, $(1\,2\,3)(2\,3)=(1\,2)$. In terms of the matrices we calculate:

$\displaystyle\begin{pmatrix}{0}&0&1\\1&0&0\\{0}&1&0\end{pmatrix}\begin{pmatrix}1&0&0\\{0}&0&1\\{0}&1&0\end{pmatrix}=\begin{pmatrix}{0}&1&0\\1&0&0\\{0}&0&1\end{pmatrix}$

You can check for yourself all the other cases that you care to.

Notice that in general the matrices are index by two elements of $S$, and the matrix element $\rho(g)_s^t$ — the one in the $s$th row and $t$th column — is $\delta_{A(g,t),s}$. That is, it’s $1$ if $A(g,t)=s$ — if the action of $g$ sends $s$ to $t$ — and $0$ otherwise. This guarantees that every entry will be either $0$ or $1$, and that each row and each column will have exactly one $1$. Such a matrix we call a “permutation matrix”, and we see that the matrices that occur in permutation representations are permutation matrices.

September 16, 2010

## Modules

With the group algebra in hand, we now define a “$G$-module” to be a module for the group algebra of $G$. That is, it’s a (finite-dimensional) vector space $V$ and a bilinear map $A:\mathbb{C}[G]\times V\to V$. This map must satisfy $A(\mathbf{e},v)=v$ and $A(\mathbf{g},A(\mathbf{h},v))=A(\mathbf{gh},v)$.

This is really the same thing as a representation, since we may as well pick a basis $\{e_i\}$ for $V$ and write $V=\mathbb{C}^d$. Then for any $g\in G$ we can write

$\displaystyle A(\mathbf{g},e_i)=\sum\limits_{j=1}^dm_i^je_j$

That is, $A(\mathbf{g},\underbar{\hphantom{X}})$ is a linear map from $V$ to itself, with its matrix entries given by $m_i^j$. We define this matrix to be $\rho(g)$, which must be a representation because of the conditions on $A$ above.

Conversely, if we have a matrix representation $\rho:G\to GL_d$, we can define a module map for $\mathbb{C}^d$ as

$\displaystyle A(\mathbf{g},v)=\rho(g)v$

where we apply the matrix $\rho(g)$ to the column vector $v$. This must satisfy the above conditions, since they reflect the fact that $\rho$ is a representation.

In fact, to define $A$, all we really need to do is to define it for the basis elements $\mathbf{g}\in\mathbb{C}[G]$. Then linearity will take care of the rest of the group algebra. That is, we can just as well say that a $G$-module is a vector space $V$ and a function $A:G\times V\to V$ satisfying the following three conditions:

• $A$ is linear in $V$: $A(g,cv+dw)=cA(g,v)+dA(g,w)$.
• $A$ preserves the identity: $A(e,v)=v$.
• $A$ preserves the group operation: $A(g,A(h,v))=A(gh,v)$.

The difference between the representation viewpoint and the $G$-module viewpoint is that representations emphasize the group elements and their actions, while $G$-modules emphasize the representing space $V$. This viewpoint will be extremely helpful when we want to consider a representation as a thing in and of itself. It’s easier to do this when we think of it as a vector space equipped with the extra structure of a $G$-action.

September 15, 2010

## The Group Algebra

A useful construction for our purposes is the group algebra $\mathbb{C}[G]$. We’ve said a lot about this before, and showed a number of things about it, but most of those we can ignore for now. All that’s really important is that $\mathbb{C}[G]$ is an algebra whose representations are intimately connected with those of $G$.

When we say it’s an algebra, we just mean that it’s a vector space with a distributive multiplication defined on it. And in our case of a finite group $G$ it’s easy to define both. For every group element $g\in G$ we have a basis vector $\mathbf{g}\in\mathbb{C}[G]$. That is, we get every vector in the algebra by picking one complex coefficient $c_g$ for each element $g\in G$, and adding them all up:

$\displaystyle\sum\limits_{g\in G}c_g\mathbf{g}$

Multiplication is exactly what we might expect: the product of two basis vectors $\mathbf{g}$ and $\mathbf{h}$ is the basis vector $\mathbf{gh}$, and we extend everything else by linearity!

$\displaystyle\left(\sum\limits_{g\in G}c_g\mathbf{g}\right)\left(\sum\limits_{h\in G}d_h\mathbf{h}\right)=\sum\limits_{g,h\in G}c_gd_h\mathbf{gh}$

We often rearrange this sum to collect all the terms with a given basis vector together:

$\displaystyle\sum\limits_{g\in G}\sum\limits_{g_1g_2=g}c_{g_1}d_{g_2}\mathbf{g}=\sum\limits_{g\in G}\left(\sum\limits_{h\in G}c_{gh^{-1}}d_h\right)\mathbf{g}$

Neat, huh?

And we can go from representations of a group to representations of its group algebra. Indeed, if $\rho:G\to GL_d$ is a representation, then we can define

$\displaystyle\rho\left(\sum\limits_{g\in G}c_g\mathbf{g}\right)=\sum\limits_{g\in G}c_g\rho(g)$

Indeed, each $\rho(g)$ is a matrix, and we can multiply matrices by complex numbers and add them together, so the right hand side is perfectly well-defined as a $d\times d$ matrix in the matrix algebra $M_d$. It’s a simple matter to verify that $\rho:\mathbb{C}[G]\to M_d$ preserves addition of vectors, scalar multiples of vectors, and products of vectors.

Conversely, if $\rho:\mathbb{C}[G]\to M_d$ is a representation, then we can restrict it to our basis vectors and get a map $\rho:G\to GL_d$. The image of each basis vector must be invertible, for we have

$\displaystyle\rho(\mathbf{g})\rho(\mathbf{g^{-1}})=\rho(\mathbf{gg^{-1}})=\rho(\mathbf{e})=1$

September 14, 2010

## Some Sample Representations

As promised, we want to see some examples of matrix representations for those who might not have seen much of them before. These are homomorphisms from a group $G$ to the group $GL_d(\mathbb{C})$ — we will often simply write $GL_d$ — of invertible $d\times d$ complex matrices. The positive integer $d$ is called the “dimension” or “degree” of the representation. The first few of our examples will be one-dimensional, largely because that makes the examples simpler. Indeed, a $1\times1$ matrix is simply a complex number, and multiplication of such matrices is just complex multiplication!

First off, every group has the “trivial representation”, which sends each group element $g\in G$ to the matrix $\begin{pmatrix}1\end{pmatrix}$. It should be clear that the identity of $G$ is sent to the identity matrix — indeed, every group element is sent to the identity matrix — and that the image of the product of two group elements is the product of their images. Writing $\rho_1$ for the trivial representation we find:

$\displaystyle\rho_1(g)\rho_1(h)=\begin{pmatrix}1\end{pmatrix}\begin{pmatrix}1\end{pmatrix}=\begin{pmatrix}1\end{pmatrix}=\rho_1(gh)$

Another one we’ve seen is the signum representation $\mathrm{sgn}$ of a permutation group, which sends even permutations to $\begin{pmatrix}1\end{pmatrix}$ and odd permutations to $\begin{pmatrix}-1\end{pmatrix}$.

Let’s diverge from the symmetric group for a moment and consider the cyclic group $\mathbb{Z}_n$. This consists of powers of a single generator $g$ with the relation $g^n=e$. That is, we have $G=\left\{e,g,g^2,g^3,\dots,g^n=e\right\}$. The definition of a representation tells us that we must send $e$ to $\begin{pmatrix}1\end{pmatrix}$, but we may have some latitude in choosing where to send the generator $g$. Since it has to go somewhere, let’s set $\rho_c(g)=\begin{pmatrix}c\end{pmatrix}$. And now we know what happens to everything else in the group!

$\displaystyle\rho_c(g^k)=\rho_c(g)^k=\begin{pmatrix}c\end{pmatrix}^k=\begin{pmatrix}c^k\end{pmatrix}$

in particular, we have $g^n=e$, so these must go to the same value. That is,

$\displaystyle\begin{pmatrix}c^n\end{pmatrix}=\rho_c(g^n)=\rho_c(e)=\begin{pmatrix}1\end{pmatrix}$

and we must have $c^n=1$. Thus we have such a representation for each of the $n$th roots of unity, and these are all possible one-dimensional representations of $\mathbb{Z}_n$.

This illustrates a common technique for finding representations, at least when we have a nice finite presentation of our group: choose an image for each of the generators, and use the relations to give us equations which these images must satisfy. The fact that this works out is deeply wrapped up in the universal properties of free groups, but if that sounds scary you don’t have to worry about it!

We should include at least one example of a higher-degree representation. A nice one is the following representation of the group $\mathbb{Z}$ of integers. Remember that the usual operation on integers is addition, not “multiplication” like we usually say for groups. With that in mind, we set

$\displaystyle\rho(n)=\begin{pmatrix}1&n\\{0}&1\end{pmatrix}$

It’s straightforward to check that the additive identity $0$ is sent to the identity matrix, and that the group operation is preserved:

$\displaystyle\rho(m)\rho(n)=\begin{pmatrix}1&m\\{0}&1\end{pmatrix}\begin{pmatrix}1&n\\{0}&1\end{pmatrix}=\begin{pmatrix}1&m+n\\{0}&1\end{pmatrix}=\rho(m+n)$

September 13, 2010

## Conjugates

For some more review, let’s recall the idea of conjugation in a group $G$. We say that two elements $g$ and $h$ are “conjugate” if $g=khk^{-1}$ for some $k\in G$.

This is an equivalence relation — reflexive, symmetric, and transitive. Any element $g$ is conjugate to itself by the group identity; if $g=khk^{-1}$, then $h=k^{-1}g\left(k^{-1}\right)^{-1}$; if $f=jgj^{-1}$ and $g=khk^{-1}$, then $f=(jk)h(jk)^{-1}$. Thus the set underlying the group $G$ can be partitioned into “conjugacy classes”: two group elements are in the same conjugacy class if and only if they are conjugate. The conjugacy class containing a group $g$ is commonly written $K_g$. The different conjugacy classes are pairwise disjoint, and their union is all of $G$.

Now, since these days we’re concerned with the symmetric group. And it turns out that some nice things happen in this case. We’ve actually already seen a lot of these!

First of all, when we write a permutation in cycle notation it’s easy to see what conjugation does to it. Indeed, if $(a_1\,a_2\,\dots\,a_k)$ is a $k$-cycle and if $g$ is any other permutation, then we can write down the conjugate:

$\displaystyle g(a_1\,\dots\,a_k)g^{-1}=(g(a_1)\,\dots\,g(a_k))$

And the same goes for any other permutation written in cycle notation: the conjugate by $g$ is given by applying $g$ to each symbol in the cycle notation. In particular, any two conjugate permutations have the same cycle type. In fact, the converse is also true: given any two permutations with the same cycle type, we can find a permutation that sends the one to the other.

For example, consider the permutations $(1\,5\,2)(3\,4)$ and $(2\,3\,1)(4\,5)$. Stack them on top of each other:

\displaystyle\begin{aligned}(1\,5\,2)&(3\,4)\\(2\,3\,1)&(4\,5)\end{aligned}

turn this into two-line notation

$\displaystyle\left\lvert\begin{matrix}1&5&2&3&4\\2&3&1&4&5\end{matrix}\right\rvert$

and we’ve got a permutation $(1\,2)(3\,4\,5)$ which sends $(1\,5\,2)(3\,4)$ to $(2\,3\,1)(4\,5)$:

$\displaystyle(1\,2)(3\,4\,5)\cdot(1\,5\,2)(3\,4)\cdot(2\,1)(5\,4\,3)=(1\,2\,3)(4\,5)$

which is equivalent to $(2\,3\,1)(4\,5)$. That is, two permutations are conjugate if and only if they have the same cycle type. This is really big. Given a cycle type $\lambda$, we will write the corresponding conjugacy class as $K_\lambda$.

We also know from some general facts about group actions that the number of elements in the conjugacy class $K_g$ is equal to the number of cosets of the “centralizer” $Z_g$. We recall that the centralizer of $g$ is the collection of group elements that commute with $g$. That is:

$\displaystyle Z_g=\left\{h\in G\vert hgh^{-1}=g\right\}$

and we have the equation

$\displaystyle\lvert K_g\rvert=\frac{\lvert G\rvert}{\lvert Z_g\rvert}$

In the case of $S_n$ — and armed with our formula for conjugating permutations in cycle notation — we can use this to count the size of each $K_\lambda$. We know that $\lvert S_n\rvert=n!$, so all we need is to find out how many permutations $h$ leave a permutation $g$ (with cycle type $\lambda$) the same when we conjugate it to get $hgh^{-1}$. We will write this number as $z_\lambda=\lvert Z_g\rvert$, and similarly we will write $k_\lambda=\lvert K_\lambda\rvert=\lvert K_g\rvert$.

So, how can we change the cycle notation of $g$ and get something equivalent back again? Well, for any $k$-cycle in $g$ we can rotate it around in $k$ different ways. For instance, $(1\,2\,3)$ is the same as $(2\,3\,1)$ is the same as $(3\,1\,2)$. We can also shuffle around any cycles that have the same length: $(1\,2)(3\,4)$ is the same as $(3\,4)(1\,2)$.

Thus if $g$ has cycle type $\lambda=(1^{m_1},2^{m_2},\dots,n^{m_n})$, then we can shuffle the $1$-cycles $m_1!$ ways; we can shuffle the $2$-cycles $m_2!$ ways; and so on until we can shuffle the $n$-cycles $m_n!$ ways. Each $1$-cycle can be rotated into $1$ different position for a total of $1^{m_1}$ choices; each $2$-cycle can be rotated into $2$ different positions for a total of $2^{m_2}$ choices; and so on until each $n$-cycle can be rotated into $n$ different positions for a total of $n^{m_n}$ choices. Therefore we have a total of

$1^{m_1}m_1!2^{m_2}m_2!\dots n^{m_n}m_n!$

different ways of writing the same permutation $g$. And each of these ways corresponds to a permutation in $Z_g$. We have thus calculated

$\lvert Z_g\rvert=1^{m_1}m_1!2^{m_2}m_2!\dots n^{m_n}m_n!$

and we conclude

$\displaystyle k_\lambda=\lvert K_g\rvert=\frac{\lvert S_n\rvert}{\lvert Z_g\rvert}=\frac{n!}{z_\lambda}=\frac{n!}{1^{m_1}m_1!2^{m_2}m_2!\dots n^{m_n}m_n!}$

As a special case, how many transpositions are there in the group $S_n$? Recall that a transposition is a permutation of the form $(a\,b)$, which has cycle type $(1^{n-2},2^1,3^0,\dots,n^0)$. Our formula tells us that there are

$\displaystyle\frac{n!}{1^{n-2}(n-2)!2^11!3^00!\dots n^00!}=\frac{n!}{(n-2)!2}=\binom{n}{2}$

permutations in this conjugacy class, as we could expect.

September 10, 2010

## Cycle Type

One concept I’d like to introduce is that of the “cycle type” of a permutation. This simply counts the number of cycles of various length in a permutation. For example — using the sample permutations from last time$\alpha=(1\,2)(3\,4)$ has two cycles of length two, while $\beta=(1\,4\,3)(2)$ has one cycle of length three and one “fixed point” — a cycle of length one. No cycle in a permutation in $S_n$ can be longer than $n$, so we only need to count up to $n$. We collect the information in a cycle type into a tuple of the form $(1^{m_1},2^{m_2},\dots,n^{m_n})$. The cycle type of $\alpha$ is $(1^0,2^2,3^0)$, and that of $\beta$ is $(1^1,2^0,3^1)$.

It should be clear that the sum of the cycle lengths is $n$. In a formula:

$\displaystyle\sum\limits_{k=1}^nkm_k=n$

That is, the cycle type breaks up or “partitions” $n$ into chunks whose total size adds up to $n$. In general, a partition $\lambda$ of $n$ is a sequence of numbers $(\lambda_1,\dots,\lambda_l)$ in nonincreasing order, whose sum is $n$. Thus the cycle type of $\alpha$ gives us the partition $(2,2)$, while the cycle type of $\beta$ gives us the partition $(3,1)$.

One more example, from the beginning: the two-line notation

$\displaystyle\left\lvert\begin{matrix}1&2&3&4&5\\2&3&1&4&5\end{matrix}\right\rvert$

describing a permutation in $S_5$ has the cycle notation $(1\,2\,3)(4)(5)$. Its cycle type is $(1^2,2^0,3^1,4^0,5^0)$, which corresponds to the partition $(3,1,1)$.

September 9, 2010

## Some Review

Before we push on into our new topic, let’s look back at some of the background that we’ve already covered.

We’re talking about symmetric groups, which are, of course, groups. We have various ways of writing down an element of $S_n$, including the two-line notation and the cycle notation that are covered in our earlier description of the symmetric groups. As an example, the two-line notation

$\displaystyle\left\lvert\begin{matrix}1&2&3&4\\2&1&4&3\end{matrix}\right\rvert$

and the cycle notation $(1\,2)(3\,4)$ both describe the permutation $\alpha\in S_4$ that sends $1$ to $2$, $2$ back to $1$, and similarly swaps $3$ and $4$. Similarly, the two-line notation
the composition of
$\displaystyle\left\lvert\begin{matrix}1&2&3&4\\4&2&1&3\end{matrix}\right\rvert$

and the cycle notation $(1\,4\,3)(2)$ or (equivalently) $(1\,4\,3)$ describe the permutation $\beta$ that cycles the elements $1$, $4$, and $3$ (in that order) and leaves $2$ untouched.

We’re specifically concerned with complex representations of these groups. That is, we want to pick some complex vector space $V$, and for each permutation $\sigma\in S_n$ we want to come up with some linear transformation $\rho(\sigma):V\to V$ for which the composition of linear transformations and the composition of permutations are “the same” in the sense that given two permutations $\sigma$ and $\tau$, the transportation corresponding to the composite $\rho(\sigma\tau)$ is equal to the composite of the corresponding transformations $\rho(\sigma)\rho(\tau)$.

We’re primarily interested in finite-dimensional representations. That is, ones for which $V$ is a finite-dimensional complex vector space. In this case, we know that we can always just assume that $V=\mathbb{C}^k$ — the space of $k$-tuples of complex numbers — and that linear transformations are described by matrices. Composition of transformations is reflected in matrix multiplication. That is, for every permutation $\sigma\in S_n$ we want to come up with an $k\times k$ matrix $\rho(\sigma)$ so that the matrix $\rho(\sigma\tau)$ corresponding to the composition of two permutations is the product $\rho(\sigma)\rho(\tau)$ of the matrices corresponding to the two permutations. I’ll be giving some more explicit examples soon.

September 8, 2010

## New Topic: The Representation Theory of the Symmetric Group

Okay, I’m done with measure theory (for now, at least), and not a moment too soon. It’s been good for me to work through all this stuff again, and I hope it’s provided a good resource, but my traffic has really taken a hit, at least as measured by daily page views. I think maybe everybody else hates analysis too?

So let’s go in a completely different direction! I want to talk about the representation theory of permutation groups. Now at least on the surface you might not think there’s a lot to say, but it’s a surprisingly detailed subject. And since every finite group can be embedded in a permutation group — its action on itself by left multiplication permutes its own elements — and many natural symmetries come in the form of permutations, it’s a very useful subject as well.

There’s also going to be a fair amount of focus on the combinatorics involved in the representation theory, and there are some really amazingly beautiful patterns. But we’ll also see some very explicit descriptions of how to actually count these things. If I feel up to it, I may try to actually implement some of the algorithms we see and talk about that over at my new programming weblog The Unapologetic Programmer. Oh yeah: if you haven’t heard yet, I’ve got a new weblog about programming, so all of you into software design should be heading over there and adding it to your RSS reader. And since my work there is a lot more casual, you should tell your other programmer friends about it too!

So enough shameless self-promotion; let’s get down to business.

September 7, 2010

## Some Continuous Duals

I really wish I could just say $L^p$ in post titles.

Anyway, I want to investigate the continuous dual of $L^p(\mu)$ for $1\leq p<\infty$. That is, we’re excluding the case where either $p$ (but not its Hölder conjugate $q$) is infinite. And I say that when $(X,\mathcal{S},\mu)$ is $\sigma$-finite, the space $L^p(\mu)'$ of bounded linear functionals on $L^p(\mu)$ is isomorphic to $L^q(\mu)$.

First, I’m going to define a linear map $\kappa_p:L^q\to\left(L^p\right)'$. Given a function $g\in L^q$, let $\kappa_p(g)$ be the linear functional defined for any $f\in L^p$ by

$\displaystyle\left[\kappa_p(g)\right](f)=\int fg\,d\mu$

It’s clear from the linearity of multiplication and of the integral itself, that this is a linear functional on $L^p$. Hölder’s inequality itself shows us that not only does the integral on the right exist, but

$\displaystyle\lvert\left[\kappa_p(g)\right](f)\rvert\leq\lVert fg\rVert_1\leq\lVert f\rVert_p\lVert g\rVert_q$

That is, $\kappa_p(g)$ is a bounded linear functional, and the operator norm $\lVert\kappa_p(g)\rVert_\text{op}$ is at most the $L^q$ norm of $g$. The extremal case of Hölder’s inequality shows that there is some $f$ for which this is an equality, and thus we conclude that $\lVert\kappa_p(g)\rVert_\text{op}=\lVert g\rVert_q$. That is, $\kappa_p:L^q\to\left(L^p\right)'$ is an isometry of normed vector spaces. Such a mapping has to be an injection, because if $\kappa_p(g)=0$ then $0=\lVert\kappa_p(g)\rVert_\text{op}=\lVert g\rVert_q$, which implies that $g=0$.

Now I say that $\kappa_p$ is also a surjection. That is, any bounded linear functional $\Lambda:L^p\to\mathbb{R}$ is of the form $\kappa_p(g)$ for some $g\in L^q$. Indeed, if $\Lambda=0$ then we can just pick $g=0$ as a preimage. Thus we may assume that $\Lambda$ is a nonzero bounded linear functional on $L^p$, and $\lVert\Lambda\rVert_\text{op}>0$. We first deal with the case of a totally finite measure space.

In this case, we define a set function on measurable sets by $\lambda(E)=\Lambda(\chi_E)$. It’s straightforward to see that $\lambda$ is additive. To prove countable additivity, suppose that $E$ is the countable disjoint union of a sequence $\{E_n\}$. If we write $A_k$ for the union of $E_1$ through $E_k$, we find that

$\displaystyle\lVert\chi_E-\chi_{A_k}\rVert_p=\left(\mu(E\setminus A_k)\right)^\frac{1}{p}\to0$

Since $\Lambda$ is continuous, we conclude that $\lambda(A_k)\to\lambda(E)$, and thus that $\lambda$ is a (signed) measure. It should also be clear that $\mu(E)=0$ implies $\lambda(E)=0$, and so $\lambda\ll\mu$. The Radon-Nikodym theorem now tells us that there exists an integrable function $g$ so that

$\displaystyle\Lambda(\chi_E)=\lambda(E)=\int\limits_Eg\,d\mu=\int\chi_Eg\,d\mu$

Linearity tells us that

$\displaystyle\Lambda(f)=\int fg\,d\mu$

for simple functions $f$, and also for every $f\in L^\infty(\mu)$, since each such function is the uniform limit of simple functions. We want to show that $g\in L^q$.

If $p=1$, then we must show that $g$ is essentially bounded. In this case, we find

$\displaystyle\left\lvert\int\limits_Eg\,d\mu\right\rvert\leq\lVert\Lambda\rVert_\text{op}\,\lVert\chi_E\rVert_1=\lVert\Lambda\rVert_\text{op}\mu(E)$

for every measurable $E$, from which we conclude that $\lvert g(x)\rvert\leq\lVert\Lambda\rVert_\text{op}$ a.e., or else we could find some set on which this inequality was violated. Thus $\lVert g\rVert_\infty\leq\lVert\Lambda\rVert_\text{op}$.

For other $p$, we can find a measurable $\alpha$ with $\lvert\alpha\rvert=1$ so that $\alpha g=\lvert g\rvert$. Setting $E_n=\{x\in X\vert n\geq\lvert g(x)\rvert\}$ and defining $f=\chi_{E_n}\lvert g\rvert^{q-1}\alpha$, we find that $\lvert f\rvert^p=\lvert g\rvert^q$ on $E_n$, $f\in L^\infty$, and so

$\displaystyle\int\limits_{E_n}\lvert g\rvert^q\,d\mu=\int\limits_Xfg\,d\mu=\Lambda(f)\leq\lVert\Lambda\rVert_\text{op}=\lVert\Lambda\rVert_\text{op}\left(\int\limits_{E_n}\lvert g\rvert^q\,d\mu\right)^\frac{1}{p}$

We thus find

$\displaystyle\left(\int\limits_{E_n}\lvert g\rvert^q\,d\mu\right)^\frac{1}{q}=\left(\int\limits_{E_n}\lvert g\rvert^q\,d\mu\right)^{1-\frac{1}{p}}\leq\lVert\Lambda\rVert_\text{op}$

and thus

$\displaystyle\int\limits_X\chi_{E_n}\lvert g\rvert^q\,d\mu\leq\lVert\Lambda\rVert_\text{op}^q$

Applying the monotone convergence theorem as $n\to\infty$ we find that $\lVert g\rVert_q\leq\lVert\Lambda\rVert_\text{op}$.

Thus in either case we’ve found a $g\in L^q$ so that $\Lambda=\kappa_p(g)$.

In the $\sigma$-finite case, we can write $X$ as the countable disjoint union of sets $X_i$ with $\mu(X_i)<\infty$. We let $Y_k$ be the union of the first $k$ of these sets. We note that $\lVert\chi_E f\rVert_p\leq\lVert f\rVert_p$ for every measurable set $E$, so $f\mapsto\Lambda(\chi_Ef)$ is a linear functional on $L^p$ of norm at most $\lVert\Lambda\rVert_\text{op}$. The finite case above shows us that there are functions $g_i$ on $X_i$ so that

$\displaystyle\Lambda(\chi_{X_i}f)=\int\limits_{X_i}fg_i\,d\mu$.

We can define $g_i(x)=0$ if $x\notin X_i$, and let $g$ be the sum of all these $g_i$. We see that

$\displaystyle\Lambda(\chi_{Y_k}f)=\int\limits_{Y_k}f(g_1+\dots+g_k)\,d\mu$

for every $f\in L^p$, and since $\mu(Y_k)<\infty$ we find that $\lVert g_1+\dots+g_k\rVert_q\leq\lVert\Lambda\rVert_\text{op}$. Then Fatou’s lemma shows us that $\lVert g\rVert_q\leq\lVert\Lambda\rVert_\text{op}$. Thus the $\sigma$-finite case is true as well.

One case in particular is especially worthy of note: since $2$ is Hölder-coonjugate to itself, we find that $L^2$ is isomorphic to its own continuous dual space in the same way that a finite-dimensional inner-product space is isomorphic to its own dual space.

September 3, 2010

## Bounded Linear Transformations

In the context of normed vector spaces we have a topology on our spaces and so it makes sense to ask that maps $f:V\to W$ between them be continuous. In the finite-dimensional case, all linear functions are continuous, so this hasn’t really come up before in our study of linear algebra. But for functional analysis, it becomes much more important.

Now, really we only need to require continuity at one point — the origin, to be specific — because if it’s continuous there then it’ll be continuous everywhere. Indeed, continuity at $v'$ means that for any $\epsilon>0$ there is a $\delta>0$ so that $\lVert v-v'\rVert_V<\delta$ implies $\lVert f(v)-f(v')\rVert_W=\lVert f(v-v')\rVert_W<\epsilon$. In particular, if $v'=0$, then this means $\lVert v\rVert_V<\delta$ implies $\lVert f(v)\rVert_W<\epsilon$. Clearly if this holds, then the general version also holds.

But it turns out that there’s another equivalent condition. We say that a linear transformation $f:V\to W$ is “bounded” if there is some $M>0$ such that $\lVert f(v)\rVert_W\leq M\lVert v\rVert_V$ for all $v\in V$. That is, the factor by which $f$ stretches the length of a vector is bounded. By linearity, we only really need to check this on the unit sphere $\lVert v\rVert_V=1$, but it’s often just as easy to test it everywhere.

Anyway, I say that a linear transformation is continuous if and only if it’s bounded. Indeed, if $f:V\to W$ is bounded, then we find

$\displaystyle M\lVert h \rVert_V\geq\lVert f(h)\rVert_W=\lVert f(v+h)-f(v)\rVert_W$

so as we let $h$ approach $0$ — as $v+h$ approaches $v$ — the difference between $f(v+h)$ and $f(v)$ approaches zero as well. And so $f$ is continuous.

Conversely, if $f$ is continuous, then it is bounded. Since it’s continuous, we let $\epsilon=1$ and find a $\delta$ so that $\lVert f(h)\rVert_W<1$ for all vectors $h$ with $\lVert h\rVert_V<\delta$. Thus for all nonzero $v\in V$ we find

\displaystyle\begin{aligned}\lVert f(v)\rVert_W&=\left\lVert\frac{\lVert v\rVert_V}{\delta}f\left(\delta\frac{v}{\lVert v\rvert_V}\right)\right\rVert_W\\&=\frac{\lVert v\rVert_V}{\delta}\left\lVert f\left(\delta\frac{v}{\lVert v\rVert_V}\right)\right\rVert_W\\&\leq\frac{\lVert v\rVert_V}{\delta}\,1\\&=\frac{1}{\delta}\lVert v\rVert_V\end{aligned}

Thus we can use $M=\frac{1}{\delta}$ and conclude that $f$ is bounded.

The least such $M$ that works in the condition for $f$ to be bounded is called the “operator norm” of $f$, which we write as $\lVert f\rVert_\text{op}$. It’s straightforward to verify that $\lVert cf\rVert_\text{op}=\lvert c\rvert\lVert f\rVert_\text{op}$, and that $\lVert f\rVert_\text{op}=0$ if and only if $f$ is the zero operator. It remains to verify the triangle identity.

Let’s say that we have bounded linear transformations $f:V\to W$ and $g:T\to W$ with operator norms $M=\lVert f\rVert_\text{op}$ and $N=\lVert g\rVert_\text{op}$, respectively. We will show that $M+N$ works as a bound for $f+g$, and thus conclude that $\lVert f+g\rVert_\text{op}\leq\lVert f\rVert_\text{op}+\lVert g\rVert_\text{op}$. Indeed, we check that

\displaystyle\begin{aligned}\lVert[f+g](v)\rVert_W&=\lVert f(v)+g(v)\rVert_W\\&\leq\lVert f(v)\rVert_W+\lVert g(v)\rVert_W\\&\leq M\lVert v\rVert_V+N\lVert v\rVert_V\\&=(M+N)\lVert v\rVert_V\end{aligned}

and our assertion follows. In particular, when our base field is itself a normed linear space (like $\mathbb{C}$ or $\mathbb{R}$ itself) we can conclude that the “continuous dual space$V'$ consisting of bounded linear functionals $\Lambda:V\to\mathbb{F}$ is a normed linear space using the operator norm on $V'$.

September 2, 2010