# The Unapologetic Mathematician

## Lie Algebras Revisited

Well it’s been quite a while, but I think I can carve out the time to move forwards again. I was all set to start with Lie algebras today, only to find that I’ve already defined them over a year ago. So let’s pick up with a recap: a Lie algebra is a module — usually a vector space over a field $\mathbb{F}$ — called $L$ and give it a bilinear operation which we write as $[x,y]$. We often require such operations to be associative, but this time we impose the following two conditions:

\displaystyle\begin{aligned}{}[x,x]&=0\\ [x,[y,z]]+[y,[z,x]]+[z,[x,y]]&=0\end{aligned}

Now, as long as we’re not working in a field where $1+1=0$ — and usually we’re not — we can use bilinearity to rewrite the first condition:

\displaystyle\begin{aligned}0&=[x+y,x+y]\\&=[x,x]+[x,y]+[y,x]+[y,y]\\&=0+[x,y]+[y,x]+0\\&=[x,y]+[y,x]\end{aligned}

so $[y,x]=-[x,y]$. This antisymmetry always holds, but we can only go the other way if the character of $\mathbb{F}$ is not $2$, as stated above.

The second condition is called the “Jacobi identity”, and antisymmetry allows us to rewrite it as:

$\displaystyle[x,[y,z]]=[[x,y],z]+[y,[x,z]]$

That is, bilinearity says that we have a linear mapping $x\mapsto[x,\underline{\hphantom{X}}]$ that sends an element $x\in L$ to a linear endomorphism in $\mathrm{End}(L)$. And the Jacobi identity says that this actually lands in the subspace $\mathrm{Der}(L)$ of “derivations” — those which satisfy something like the Leibniz rule for derivatives. To see what I mean, compare to the product rule:

$\displaystyle\frac{d}{dt}\left(fg\right)=\frac{df}{dt}g+f\frac{dg}{dt}$

where $f$ takes the place of $y$, $g$ takes the place of $z$, and $\frac{d}{dt}$ takes the place of $x$. And the operations are changed around. But you should see the similarity.

Lie algebras obviously form a category whose morphisms are called Lie algebra homomorphisms. Just as we might expect, such a homomorphism is a linear map $\phi:L\to L'$ that preserves the bracket:

$\displaystyle\phi\left([x,y]\right)=\left[\phi(x),\phi(y)\right]$

We can obviously define subalgebras and quotient algebras. Subalgebras are a bit more obvious than quotient algebras, though, being just subspaces that are closed under the bracket. Quotient algebras are more commonly called “homomorphic images” in the literature, and we’ll talk more about them later.

We will take as a general assumption that our Lie algebras are finite-dimensional, though infinite-dimensional ones absolutely exist and are very interesting.

And I’ll finish the recap by reminding you that we can get Lie algebras from associative algebras; any associative algebra $(A,\cdot)$ can be given a bracket defined by

$\displaystyle [x,y]=x\cdot y-y\cdot x$

The above link shows that this satisfies the Jacobi identity, or you can take it as an exercise.

August 6, 2012 Posted by | Algebra, Lie Algebras | 7 Comments

## The Higgs Mechanism part 4: Symmetry Breaking

This is part four of a four-part discussion of the idea behind how the Higgs field does its thing. Read Part 1, Part 2, and Part 3 first.

At last we’re ready to explain the Higgs mechanism. We start where we left off last time: a complex scalar field $\phi$ with a gauged phase symmetry that brings in a (massless) gauge field $A_\mu$. The difference is that now we add a new self-interaction term to the Lagrangian:

$\displaystyle L=-\frac{1}{4}F_{\mu\nu}F_{\mu\nu}+(D_\mu\phi)^*D_\mu\phi-\left[-m^2\phi^*\phi+\lambda(\phi^*\phi)^2\right]$

where $\lambda$ is a constant that determines the strength of the self-interaction. We recall the gauged symmetry transformations:

\displaystyle\begin{aligned}\phi'(x)&=e^{i\alpha(x)}\phi(x)\\A_\mu'(x)&=A_\mu(x)+\frac{1}{e}\partial_\mu\alpha(x)\end{aligned}

If we write down an expression for the energy of a field configuration we get a bunch of derivative terms — basically like kinetic energy — that all occur with positive signs and then the potential energy term that comes in the brackets above:

$\displaystyle V(\phi^*\phi)=-m^2\phi^*\phi+\lambda(\phi^*\phi)^2$

Now, the “ground state” of the system should be one that minimizes the total energy, but the usual choice of setting all the fields equal to zero doesn’t do that here. The potential has a “bump” in the center, like the punt in the bottom of a wine bottle, or like a sombrero.

So instead of using that as our ground state, we’ll choose one. It doesn’t matter which, but it will be convenient to pick:

\displaystyle\begin{aligned}A_\mu^{(v)}&=0\\\phi^{(v)}=\frac{1}{\sqrt{2}}\phi_0\end{aligned}

where $\phi_0=\frac{m}{\sqrt{\lambda}}$ is chosen to minimize the potential. We can still use the same field $A_\mu$ as before, but now we will write

$\displaystyle\phi(x)=\frac{1}{\sqrt{2}}\left(\phi_0+\chi(x)+i\theta(x)\right)$

Since the ground state $\phi_0$ is a point along the real axis in the complex plane, vibrations in the field $\chi$ measure movement that changes the length of $\phi$, while vibrations in $\theta$ measure movement that changes the phase.

We want to consider the case where these vibrations are small — the field $\phi$ basically sticks near its ground state — because when they get big enough we have enough energy flying around in the system that we may as well just work in the more symmetric case anyway. So we are justified in only working out our new Lagrangian in terms up to quadratic order in the fields. This will also make our calculations a lot simpler. Indeed, to quadratic order (and ignoring an irrelevant additive constant) we have

$\displaystyle V(\phi^*\phi)=m^2\chi^2$

so vibrations of the $\theta$ field don’t show up at all in quadratic interactions.

We should also write out our covariant derivative up to linear terms:

$\displaystyle D_\mu\phi=\frac{1}{\sqrt{2}}\left(\partial_\mu\chi+i\partial_\mu\theta-ie\phi_uA_\mu\right)$

so that the quadratic Lagrangian is

\displaystyle\begin{aligned}L^{(2)}&=-\frac{1}{4}F_{\mu\nu}F_{\mu\nu}+\frac{1}{2}\lvert\partial_\mu\chi+i\partial_\mu\theta-ie\phi_uA_\mu\rvert^2-m^2\chi^2\\&=-\frac{1}{4}F_{\mu\nu}F_{\mu\nu}+\left[\frac{1}{2}\partial_\mu\chi\partial_\mu\chi-m^2\chi^2\right]+\frac{e^2\phi_0^2}{2}\left(A_\mu-\frac{1}{e\phi_0}\partial_\mu\theta\right)^2\end{aligned}

Now, the term in parentheses on the right looks like the mass term of a vector field $B_\mu$ with mass $e\phi_0$. But what is the kinetic term of this field?

\displaystyle\begin{aligned}B_{\mu\nu}&=\partial_\mu B_\nu-\partial_\nu B_\mu\\&=\partial_\mu\left(A_\nu-\frac{1}{e\phi_0}\partial_\nu\theta\right)-\partial_\nu\left(A_\mu-\frac{1}{e\phi_0}\partial_\mu\theta\right)\\&=\partial_\mu A_\nu-\partial_\nu A_\mu-\frac{1}{e\phi_0}\left(\partial_\mu\partial_\nu\theta-\partial_\nu\partial_\mu\theta\right)\\&=F_{\mu\nu}-0=F_{\mu\nu}\end{aligned}

And so we can write down the final form of our quadratic Lagrangian:

$\displaystyle L^{(2)}=\left[-\frac{1}{4}B_{\mu\nu}B_{\mu\nu}+\frac{e^2\phi_0^2}{2}B_\mu B_\mu\right]+\left[\frac{1}{2}\partial_\mu\chi\partial_\mu\chi-m^2\chi^2\right]$

In order to deal with the fact that our normal vacuum was not a minimum for the energy, we picked a new ground state that did minimize energy. But the new ground state doesn’t have the same symmetry the old one did — we have broken the symmetry — and when we write down the Lagrangian in terms of excitations around the new ground state, we find it convenient to change variables. The previously massless gauge field “eats” part of the scalar field and gains a mass, leaving behind the Higgs field.

This is essentially what’s going on in the Standard Model. The biggest difference is that instead of the initial symmetry being a simple phase, which just amounts to rotations around a circle, we have a (slightly) more complicated symmetry to deal with. For those that are familiar with some classical groups, we start with an action of $SU(2)\times U(1)$ on a column vector $\phi$ made of two complex scalar fields with a potential of the form:

$\displaystyle V(\phi)=\lambda\left(\phi^\dagger\phi-\frac{v^2}{2}\right)^2$

which is invariant under the obvious action of $SU(2)$ and a phase action of $U$. Since the group $SU(2)$ is three-dimensional there are three gauge fields to introduce for its symmetry and one more for the $U(1)$ symmetry.

When we pick a ground state that breaks the symmetry it doesn’t completely break; a one-dimensional subgroup $U(1)\subseteq SU(2)\times U(1)$ still leaves the new ground state invariant — though it’s important to notice that this is not just the $U(1)$ factor, but rather a mixture of this factor and a $U(1)$ subgroup of $SU(2)$. Thus only three of these gauge fields gain mass; they become the $W^\pm$ and $Z^0$ bosons that carry the weak force. The other gauge field remains massless, and becomes $\gamma$ — the photon.

At high enough energies — when the fields bounce around enough that the bump doesn’t really affect them — then the symmetry comes back and we see that the electromagnetic and weak interactions are really two different aspects of the same, unified phenomenon, just like electricity and magnetism are really two different aspects of electromagnetism.

July 19, 2012

## The Higgs Mechanism part 3: Gauge Symmetries

This is part three of a four-part discussion of the idea behind how the Higgs field does its thing. Read Part 1 and Part 2 first.

Now we’re starting to get to the really meaty stuff. We talked about the phase symmetry of the complex scalar field:

\displaystyle\begin{aligned}\phi'(x)&=e^{i\alpha}\phi(x)\\\phi'^*(x)&=e^{-i\alpha}\phi^*(x)\end{aligned}

which basically wants to express the idea that the physics of this field only really depends on the length of the complex field values $\phi(x)$ and not on their phases. But another big principle of physics is locality — what happens here doesn’t instantly affect what happens elsewhere — so why should the phase change be global?

To answer this, we “gauge” the symmetry and make it local. The origin of the term is fascinating, but takes us too far afield. The upshot is that we now have the symmetry transformation:

\displaystyle\begin{aligned}\phi'(x)&=e^{i\alpha(x)}\phi(x)\\\phi'^*(x)&=e^{-i\alpha(x)}\phi^*(x)\end{aligned}

where $\alpha$ is no longer a constant, but a function of the spacetime point $x$.

And here’s the big problem: since $\alpha$ varies from point to point, it now affects our derivative terms! Before we had

\displaystyle\begin{aligned}\partial_\mu\phi'(x)&=\partial_\mu\left(e^{i\alpha}\phi(x)\right)\\&=e^{i\alpha}\partial_\mu\phi(x)\end{aligned}

and similarly for $\phi^*$. We say that the derivatives are “covariant” under the transformation; they transform in the same way as the underlying fields. And this is what lets us say that

$\displaystyle\partial_\mu\phi'^*\partial_\mu\phi'=\partial_\mu\phi^*\partial_\mu\phi$

and makes the whole Lagrangian symmetric.

On the other hand, what do we see now?

\displaystyle\begin{aligned}\partial_\mu\phi'(x)&=\partial_\mu\left(e^{i\alpha(x)}\phi(x)\right)\\&=e^{i\alpha(x)}\partial_\mu\phi(x)+i\partial_\mu\alpha(x)e^{i\alpha(x)}\phi(x)\\&=e^{i\alpha(x)}\left[\partial_\mu\phi(x)+i\partial_\mu\alpha(x)\phi(x)\right]\end{aligned}

We pick up this extra term when we differentiate, and it ruins the symmetry.

The way out is to add another field that can “soak up” this extra term. Since the derivative is a vector, we introduce a vector field $A_\mu$ and say that it transforms as

$\displaystyle A_\mu'(x)=A_\mu(x)+\frac{1}{e}\partial_\mu\alpha(x)$

Next, we introduce a new derivative operator: $D_\mu=\partial_\mu-ieA_\mu$. That is:

$\displaystyle D_\mu\phi(x)=\partial_\mu\phi(x)-ieA_\mu(x)\phi(x)$

And we calculate

\displaystyle\begin{aligned}D_\mu\phi'(x)&=\partial_\mu\left(e^{i\alpha(x)}\phi(x)\right)-ieA_\mu'(x)e^{i\alpha(x)}\phi(x)\\&=e^{i\alpha(x)}\partial_\mu\phi(x)+i\partial_\mu\alpha(x)e^{i\alpha(x)}\phi(x)-ieA_\mu(x)e^{i\alpha(x)}\phi(x)-i\partial_\mu\alpha(x)e^{i\alpha(x)}\phi(x)\\&=e^{i\alpha(x)}\left[\partial_\mu\phi(x)-ieA_\mu(x)\phi(x)\right]\\&=e^{i\alpha(x)}D_\mu\phi(x)\end{aligned}

So the derivative $D_\mu\phi(x)$ does vary the same way as the underlying field $\phi(x)$ does! We call $D_\mu$ the “covariant derivative”. If we use it in our Lagrangian, we do recover our symmetry, though now we’ve got a new field $A_\mu$ to contend with. Just like the electromagnetic potential we use the derivative $F_{\mu\nu}=\partial_\mu A_\nu-\partial_\nu A_\mu$ to write

$\displaystyle L=-\frac{1}{4}F_{\mu\nu}F_{\mu\nu}+(D_\mu\phi)^*D_\mu\phi-m^2\phi^*\phi$

which is now symmetric under the gauged symmetry transformations.

It may not be apparent, but this Lagrangian does contain interaction terms. We can expand out the second term to find:

\displaystyle\begin{aligned}(D_\mu\phi)^*D_\mu\phi&=\left(\partial_\mu\phi^*-ieA_\mu\phi^*\right)\left(\partial_\mu\phi-ieA_\mu\phi\right)\\&=\partial_\mu\phi^*\partial_\mu\phi-ieA_\mu\partial_\mu\phi^*\phi-ieA_\mu\phi^*\partial_\mu\phi-e^2A_\mu A_\mu\phi^*\phi\end{aligned}

Our rules of thumb tell us that if we vary the Lagrangian with respect to $A_\mu$ we get the field equation

$\displaystyle\partial_\mu F_{\mu\nu}=ej_\mu$

which — if we expand out $F_{\mu\nu}$ as if it’s the Faraday field into “electric” and “magnetic” fields — give us Gauss’ and Ampère’s law in the presence of a charge-current density $j_\mu$.

The charge-current, in particular, we can write as

$\displaystyle j_\mu=-i\left(\phi^*\partial_\mu\phi-\partial_\mu\phi^*\phi\right)-2eA_\mu\phi^*\phi$

or, in a gauge-invariant manner, as

$\displaystyle j_\mu=-i\left[\phi^*D_\mu\phi-(D_\mu\phi)^*\phi\right]$

which is just the conserved current from last time with the regular derivatives replaced by covariant ones. Similarly, varying with respect to the field $\phi$ we find the “covariant” Klein-Gordon equation:

$\displaystyle D_\mu D_\mu\phi+m^2\phi=0$

and, when this holds, we can show that $\partial_\mu j_\mu=0$.

So we’ve found that if we take the global symmetry of the complex scalar field and “gauge” it, something like electromagnetism naturally pops out, and the particle of the complex scalar field interacts with it like charged particles interact with the real electromagnetic field.

July 18, 2012

## The Higgs Mechanism part 2: Examples of Lagrangian Field Equations

This is part two of a four-part discussion of the idea behind how the Higgs field does its thing. Read Part 1 first.

Okay, now that we’re sold on the Lagrangian formalism you can rest easy: I’m not going to go through the gory details of any more variational calculus. I do want to clear a couple notational things out of the way, though. They might not all matter for the purposes of our discussion, but better safe than sorry.

First off, I’m going to use a coordinate system where the speed of light is 1. That is, if my unit of time is seconds, my unit of distance is light-seconds. Mostly this helps keep annoying constants out of the way of the equations; physicists do this basically all the time. The other thing is that I’m going to work in four-dimensional spacetime, meaning we’ve got four coordinates: $x_0$, $x_1$, $x_2$, and $x_3$. We calculate dot products by writing $v\cdot w=v_1w_1+v_2w_2+v_3w_3-v_0w_0$. Yes, that minus sign is weird, but that’s just how spacetime works.

Also instead of writing spacetime vectors, I’m going to write down their components, indexed by a subscript that’s meant to run from 0 to 3. Usually this will be a Greek letter from the middle of the alphabet like $\mu$ or $\nu$. Similarly, instead of writing $\nabla$ for the vector composed of the four spacetime derivatives of a field I’ll just write down the derivatives, and I’ll write $\partial_\mu f$ instead of $\frac{\partial f}{\partial x_\mu}$.

Along with writing down components instead of vectors I won’t be writing dot products explicitly. Instead I’ll use the common convention that when the same index appears twice we’re supposed to sum over it, remembering that the zero component gets a minus sign. That is, $v_\mu w_\mu$ is the dot product from above. Similarly, we can multiply a matrix with entries $A_{\mu\nu}$ by a vector $v_\nu$ to get $w_\mu=A_{\mu\nu}v_\nu$; notice how the summed index $\nu$ gets “eaten up” in the process.

Okay, now even without going through the details there’s a fair bit we can infer from general rules of thumb. Any term in the Lagrangian that contains a derivative of the field we’re varying is almost always going to be the squared-length of that derivative, and the resulting term in the variational equations will be the negative of a second derivative of the field. For any term that involves the plain field we basically take its derivative as if the field were a variable. Any term that doesn’t involve the field at all just goes away. And since we prefer positive second-derivative terms to negative ones, we usually flip the sign of the resulting equation; since the other side is zero this doesn’t matter.

So if, for instance, we have the following Lagrangian of a complex scalar field $\phi$:

$\displaystyle L=\partial_\mu\phi^*\partial_\mu\phi-m^2\phi^*\phi$

we get two equations by varying the field $\phi$ and its complex conjugate $\phi^*$ separately:

\displaystyle\begin{aligned}\partial_\mu\partial_\mu\phi^*+m^2\phi^*&=0\\\partial_\mu\partial_\mu\phi+m^2\phi&=0\end{aligned}

It may not seem to make sense to vary the field and its complex conjugate separately, but the two equations we get at the end are basically the same anyway, so we’ll let this slide for now. Anyway, what we get is a second derivative of $\phi$ set equal to $m^2$ times $\phi$ itself, which we call the “Klein-Gordon wave equation” for $\phi$. Since the term $m^2\phi^*\phi$ gives rise to the term $m^2\phi$ in the field equations, we call this the “mass term”.

In the case of electromagnetism in a vacuum we just have the electromagnetic fields and no charge or current distribution. We use the Faraday field $F_{\mu\nu}=\partial_\mu A_\nu-\partial_\nu A_\mu$ to write down the Lagrangian

$\displaystyle L=-\frac{1}{4}F_{\mu\nu}F_{\mu\nu}$

which gives rise to the field equations

$\displaystyle\partial_\mu F_{\mu\nu}=0$

or, equivalently in terms of the potential field $A$:

\displaystyle\begin{aligned}\partial_\mu\partial_\mu A_\nu&=0\\\partial_\nu A_\nu&=0\end{aligned}

The second equation just expresses a choice we can make to always consider divergence-free potentials without affecting the predictions of electromagnetism; the first equation looks like the Klein-Gordon equation again, except there’s no mass term. Indeed, we know that photons — the particles associated to the electromagnetic field — have no rest mass!

Turning back to the complex scalar field, we notice that there’s a certain symmetry to this Lagrangian. Specifically, if we replace $\phi(x)$ and $\phi^*$ by

\displaystyle\begin{aligned}\phi'(x)&=e^{i\alpha}\phi(x)\\\phi'^*(x)&=e^{-i\alpha}\phi^*(x)\end{aligned}

for any constant $\alpha$, we get the same result. This is important, and it turns out to be a clue that leads us — I won’t go into the details — to consider the quantity

$\displaystyle j_\mu=-i(\phi^*\partial_\mu\phi-\phi\partial_\mu\phi^*)$

This is interesting because we can calculate

\displaystyle\begin{aligned}\partial_\mu j_\mu&=-i\partial_\mu(\phi^*\partial_\mu\phi-\phi\partial_\mu\phi^*)\\&=-i(\partial_\mu\phi^*\partial_\mu\phi+\phi^*\partial_\mu\partial_\mu\phi-\partial_\mu\phi\partial_\mu\phi^*-\phi\partial_\mu\partial_\mu\phi^*)\\&=-i(\phi^*\partial_\mu\partial_\mu\phi-\phi\partial_\mu\partial_\mu\phi^*)\\&=-i(-m^2\phi^*\phi+m^2\phi\phi^*)\\&=0\end{aligned}

where we’ve used the results of the Klein-Gordon equations. Since $\partial_\mu j_\mu=0$, this is a suitable vector field to use as a charge-current distribution; the equation just says that charge is conserved! That is, we can write down a Lagrangian involving both electromagnetism — that is, our “massless vector field” $A_\mu$ and our scalar field:

$\displaystyle L=-\frac{1}{4}F_{\mu\nu}F_{\mu\nu}-ej_\mu A_\mu$

where $e$ is a “coupling constant” that tells us how important the “interaction term” involving both $j_\mu$ and $A_\mu$ is. If it’s zero, then the fields don’t actually interact at all, but if it’s large then they affect each other very strongly.

July 17, 2012

## The Higgs Mechanism part 1: Lagrangians

This is part one of a four-part discussion of the idea behind how the Higgs field does its thing.

Wow, about six months’ hiatus as other parts of my life have taken precedence. But I drag myself slightly out of retirement to try to fill a big gap in the physics blogosphere: how the Higgs mechanism works.

There’s a lot of news about this nowadays, since the Large Hadron Collider has announced evidence of a “Higgs-like” particle. As a quick explanation of that, I use an analogy I made up on Twitter: “If Mirror-Spock exists, he has a goatee. We have found a man with a goatee. We do not yet know if he is Mirror-Spock.”

So, what is the Higgs boson? Well, it’s the particle expression of the Higgs field. That doesn’t explain anything, so we go one step further. What is the Higgs field? It’s the (conjectured) thing that gives some other particles (some of their) mass, in certain situations where normally we wouldn’t expect there to be any mass. And then there’s hand-waving about something like the ether that particles have to push through or shag carpet that they have to rub against that slows them down and hey, mass. Which doesn’t really explain anything, but sort of sounds like it might and so people nod sagely and then either forget about it all or spin their misconceptions into a new wave of Dancing Wu-Li Masters.

I think we can do better, at least for the science geeks out there who are actually interested and not allergic to a little math.

A couple warnings and comments before we begin. First off: I’m not going to go through this in my usual depth because I want to cram it into just three posts, albeit longer ones than usual, even though what I will say touches on all sorts of insanely cool mathematics that disappointingly few people see put together like this. Second: Ironically, that seems to include a lot of the physicists, who are generally more concerned with making predictions than with understanding how the underlying theory connects to everything else and it’s totally fine, honestly, that they’re interested in different aspects than I am. But I’m going to make a relatively superficial pass over describing the theory as physicists talk about it rather than go into those underlying structures. Lastly: I’m not going to describe the actual Higgs particle or field as they exist in the Standard Model; that would require quantum field theory and all sorts of messy stuff like that, when it turns out that the basic idea already shows up in classical field theory, which is a lot easier to explain. Even within classical field theory I’m going to restrict myself to a simpler example of the sort of thing that happens. Because reasons.

That all said, let’s dive in with Lagrangian mechanics. This is a subject that you probably never heard about unless you were a physics major or maybe a math major. Basically, Newtonian mechanics works off of the three laws that were probably drilled into your head by the end of high school science classes:

Newton’s Laws of Motion

1. An object at rest tends to stay at rest; an object in motion tends to stay in that motion.
2. Force applied to an object is proportional to the acceleration that object experiences. The constant of proportionality is the object’s mass.
3. Every action comes paired with an equal and opposite reaction.

It’s the second one that gets the most use since we can write it down in a formula: $F=ma$. And for most forces we’re interested in the force is a conservative vector field, meaning that it’s the (negative) gradient (fancy word for “derivative” that comes up in more than one dimension) of a potential energy function: $F=-\nabla U$. What this means is that things like to move in the direction that potential energy decreases, and they “feel a force” pushing them in that direction. Upshot for Newton: $ma=-\nabla U$.

Lagrangian mechanics comes at this same formula with a different explanation: objects like to move along paths that (locally) minimize some quantity called “action”. This principle unifies the usual topics of high school Newtonian physics with things like optics where we say that light likes to move along the shortest path between two points. Indeed, the “action” for light rays is just the distance they travel! This also explains things like “the angle of incidence equals the angle of reflection”; if you look at all paths between two points that bounce off of a mirror, the one that satisfies this property has the shortest length, making it a local minimum for the action.

Let’s set this up for a body moving around in some potential field to show you how it works. The action of a suggested path $q(t)$ — the body is at the point $q(t)$ at time $t$ over a time interval $t_1\leq t\leq t_2$ is:

$\displaystyle S[q]=\int\limits_{t_1}^{t_2}\frac{1}{2}mv(t)^2-U(q(t))\,dt$

where $v(t)=\dot{q}(t)$ is the velocity vector of the particle, $v(t)^2$ is the square of its length, and $U(x)$ is a potential function depending only on the position of the particle. Don’t worry: there’s a big scary integral here, but we aren’t going to actually do any integration.

The function on the inside of the integral is called the Lagrangian function, and we calculate the action $S$ of the path $q$ by integrating the Langrangian over the time interval we’re concerned with. We write this as $S[q]$ with square brackets to emphasize that this is a “functional” that takes a function $q$ and gives a number back. Of course, as mathematicians there’s really nothing inherently special about functions taking functions as arguments, but for beginners it helps keep things straight.

Now, what happens if we “wiggle” the path a bit? What if we calculate the action of $q'=q+\delta q$, where $\delta q$ is some “small” function called the “variation” of $q$? We calculate:

$\displaystyle S[q']=\int\limits_{t_1}^{t_2}\frac{1}{2}m(\dot{q}'(t))^2-U(q'(t))\,dt$

Taking the derivative $\dot{q}'$ is linear, so we see that $\dot{q}'=\dot{q}+\delta\dot{q}$; “the variation of the derivative is the derivative of the variation”. Plugging this in:

\displaystyle\begin{aligned}S[q']&=\int\limits_{t_1}^{t_2}\frac{1}{2}m(\dot{q}(t)+\delta\dot{q}(t))^2-U(q(t)+\delta q(t))\,dt\\&=\int\limits_{t_1}^{t_2}\frac{1}{2}m(\dot{q}(t)^2+2\dot{q}(t)\cdot\delta\dot{q}(t)+\delta\dot{q}(t)^2)-U(q(t)+\delta q(t))\,dt\\&\approx\int\limits_{t_1}^{t_2}\frac{1}{2}m(\dot{q}(t)^2+2\dot{q}(t)\cdot\delta\dot{q}(t))-\left[U(q(t))+\nabla U(q(t))\cdot\delta q(t)\right]\,dt\end{aligned}

where we’ve thrown away terms involving second and higher powers of $\delta q$; the variation is small, so the square (and cube, and …) is negligible. So what’s the difference between this and $S[q]$? What’s the variation of the action?

$\displaystyle\delta S=S[q']-S[q]=\int\limits_{t_1}^{t_2}m\dot{q}(t)\cdot\delta\dot{q}(t)-\nabla U(q(t))\cdot\delta q(t)\,dt$

where again we throw away negligible terms. Now we can handle the first term here using integration by parts:

\displaystyle\begin{aligned}\delta S=S[q']-S[q]&=\int\limits_{t_1}^{t_2}-m\ddot{q}(t)\cdot\delta q(t)-\nabla U(q(t))\cdot\delta q(t)\,dt\\&=\int\limits_{t_1}^{t_2}-\left[m\ddot{q}(t)+\nabla U(q(t))\right]\cdot\delta q(t)\,dt\end{aligned}

“Wait a minute!” those of you paying attention will cry out, “what about the boundary terms!?” Indeed, when we use integration by parts we should pick up $\ddot{q}(t_2)\cdot\delta q(t_2)-\ddot{q}(t_1)\cdot\delta q(t_1)$, but we will assume that we know where the body is at the beginning and the end of our time interval, and we’re just trying to figure out how it gets from one point to the other. That is, $\delta q$ is zero at both endpoints.

So, now we apply our Lagrangian principle: bodies like to move along action-minimizing paths. We know how action changes if we “wiggle” the path by a little variation $\delta q$, and this should remind us about how to find local minima: they happen when no matter how we change the input, the “first derivative” of the output is zero. Here the first derivative is the variation in the action, throwing away the negligible terms. So, what condition will make $\delta S$ zero no matter what function we put in for $\delta q$? Well, the other term in the integrand will have to vanish:

$\displaystyle m\ddot{q}(t)+\nabla U(q(t))=0$

But this is just Newton’s second law from above, coming back again!

Everything we know from Newtonian mechanics can be written down in Lagrangian mechanics by coming up with a suitable action functional, which usually takes the form of an integral of an appropriate Lagrangian function. But lots more things can be described using the Lagrangian formalism, including field theories like electromagnetism.

In the presence of a charge distribution $\rho$ and a current distribution $j$, we take the potentials $\phi$ and $A$ as fundamental and start with the action (suppressing the space and time arguments so we can write $\rho$ instead of $\rho(x,t)$:

$\displaystyle S[\phi,A]=\int_{t_1}^{t_2}\int_{\mathbb{R}^3}-\rho\phi+j\cdot A+\frac{\epsilon_0}{2}E^2-\frac{1}{2\mu_0}B^2\,dV\,dt$

When we vary with respect to $\phi$ and insist that the variance of $S$ be zero we get Gauss’ law:

$\displaystyle\nabla\cdot E=\frac{\rho}{\epsilon_0}$

Varying the components of $A$ we get Ampère’s law with Maxwell’s correction:

$\displaystyle\nabla\times B=\mu_0j+\epsilon_0\mu_0\frac{\partial E}{\partial t}$

The other two of Maxwell’s equations come automatically from taking the potentials as fundamental and coming up with the electric and magnetic fields from them.

July 16, 2012

## A Continued Rant on Electromagnetism Texts and the Pedagogy of Science

A comment just came in on my short rant about electromagnetism texts. Dripping with condescension, it states:

Here’s the fundamental reason for your discomfort: as a mathematician, you don’t realize that scalar and vector potentials have *no physical significance* (or for that matter, do you understand the distinction between objects of physical significance and things that are merely convenient mathematical devices?).

It really doesn’t matter how scalar and vector potentials are defined, found, or justified, so long as they make it convenient for you to work with electric and magnetic fields, which *are* physical (after all, if potentials were physical, gauge freedom would make no sense).

On rare occasions (e.g. Aharonov-Bohm effect), there’s the illusion that (vector) potential has actual physical significance, but when you realize it’s only the *differences* in the potential, it ought to become obvious that, once again, potentials are just mathematically convenient devices to do what you can do with fields alone.

P.S. We physicists are very happy with merely achieving self-consistency, thankyouverymuch. Experiments will provide the remaining justification.

The thing is, none of that changes the fact that you’re flat-out lying to students when you say that the vanishing divergence of the magnetic field, on its own, implies the existence of a vector potential.

I think the commenter is confusing my complaint with a different, more common one: the fact that potentials are not uniquely defined as functions. But I actually don’t have a problem with that, since the same is true of any antiderivative. After all, what is an antiderivative but a potential function in a one-dimensional space? In fact, the concepts of torsors and gauge symmetries are intimately connected with this indefiniteness.

No, my complaint is that physicists are sloppy in their teaching, which they sweep under the carpet of agreement with certain experiments. It’s trivial to cook up magnetic fields in non-simply-connected spaces which satisfy Maxwell’s equations and yet have no globally-defined potential at all. It’s not just that a potential is only defined up to an additive constant; it’s that when you go around certain loops the value of the potential must have changed, and so at no point can the function take any “self-consistent” value.

In being so sloppy, physicists commit the sin of making unstated assumptions, and in doing so in front of kids who are too naïve to know better. A professor may know that this is only true in spaces without holes, but his students probably don’t, and they won’t until they rely on the assumption in a case where it doesn’t hold. That’s really all I’m saying: state your assumptions; unstated assumptions are anathema to science.

As for the physical significance of potentials, I won’t even bother delving into the fact that explaining Aharonov-Bohm with fields alone entails chucking locality right out the window. Rest assured that once you move on from classical electromagnetism to quantum electrodynamics and other quantum field theories, the potential is clearly physically significant.

March 8, 2012

## Minkowski Space

Before we push ahead with the Faraday field in hand, we need to properly define the Hodge star in our four-dimensional space, and we need a pseudo-Riemannian metric to do this. Before we were just using the standard $\mathbb{R}^3$, but now that we’re lumping in time we need to choose a four-dimensional metric.

And just to screw with you, it will have a different signature. If we have vectors $v_1=(x_1,y_1,z_1,t_1)$ and $v_2=(x_2,y_2,z_2,t_2)$ — with time here measured in the same units as space by using the speed of light as a conversion factor — then we calculate the metric as:

$\displaystyle g(v_1,v_2)=x_1x_2+y_1y_2+z_1z_2-t_1t_2$

In particular, if we stick the vector $v=(x,y,z,t)$ into the metric twice, like we do to calculate a squared-length when working with an inner product, we find:

$\displaystyle g(v,v)=x^2+y^2+z^2-t^2$

This looks like the Pythagorean theorem in two or three dimensions, but when we get to the time dimension we subtract $t^2$ instead of adding them! Four-dimensional real space equipped with a metric of this form is called “Minkowski space”. More specifically, it’s called 4-dimensional Minkowski space, or “(3+1)-dimensional” Minkowski space — three spatial dimensions and one temporal dimension. Higher-dimensional versions with $n-1$ “spatial” dimensions (with plusses in the metric) and one “temporal” dimension (with minuses) are also called Minkowski space. And, perversely enough, some physicists write it all backwards with one plus and $n-1$ minuses; this version is useful if you think of displacements in time as more fundamental — and thus more useful to call “positive” — than displacements in space.

What implications does this have on the coordinate expression of the Hodge star? It’s pretty much the same, except for the determinant part. You can think about it yourself, but the upshot is that we pick up an extra factor of $-1$ when the basic form going into the star involves $dt$.

So the rule is that for a basic form $\alpha$, the dual form $*\alpha$ consists of those component $1$-forms not involved in $\alpha$, ordered such that $\alpha\wedge(*\alpha)=\pm dx\wedge dy\wedge dz\wedge dt$, with a negative sign if and only if $dt$ is involved in $\alpha$. Let’s write it all out for easy reference:

\displaystyle\begin{aligned}*1&=dx\wedge dy\wedge dz\wedge dt\\ *dx&=dy\wedge dz\wedge dt\\ *dy&=dz\wedge dx\wedge dt\\ *dz&=dx\wedge dy\wedge dt\\ *dt&=dx\wedge dy\wedge dz\\ *(dx\wedge dy)&=dz\wedge dt\\ *(dz\wedge dx)&=dy\wedge dt\\ *(dy\wedge dz)&=dx\wedge dt\\ *(dx\wedge dt)&=-dy\wedge dz\\ *(dy\wedge dt)&=-dz\wedge dx\\ *(dz\wedge dt)&=-dx\wedge dy\\ *(dx\wedge dy\wedge dz)&=dt\\ *(dx\wedge dy\wedge dt)&=dz\\ *(dz\wedge dx\wedge dt)&=dy\\ *(dy\wedge dz\wedge dt)&=dx\\ *(dx\wedge dy\wedge dz\wedge dt)&=-1\end{aligned}

Note that the square of the Hodge star has the opposite sign from the Riemannian case; when $k$ is odd the double Hodge dual of a $k$-form is the original form back again, but when $k$ is even the double dual is the negative of the original form.

March 7, 2012

Now that we’ve seen that we can use the speed of light as a conversion factor to put time and space measurements on an equal footing, let’s actually do it to Maxwell’s equations. We start by moving the time derivatives over on the same side as all the space derivatives:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c\rho\\d\beta&=0\\d\epsilon+\frac{\partial\beta}{\partial t}&=0\\{}*d*\beta-\frac{\partial\epsilon}{\partial t}&=\mu_0c\iota\end{aligned}

The exterior derivatives here written as $d$ comprise the derivatives in all the spatial directions. If we pick coordinates $x$, $y$, and $z$, then we can write the third equation as three component equations that each look something like

$\displaystyle\frac{\partial\epsilon_x}{\partial y}dy\wedge dx+\frac{\partial\epsilon_y}{\partial x}dx\wedge dy+\frac{\partial\beta_x}{\partial t}dx\wedge dy=\left(\frac{\partial\epsilon_y}{\partial x}-\frac{\partial\epsilon_x}{\partial y}+\frac{\partial\beta_z}{\partial t}\right)dx\wedge dy=0$

This doesn’t look right at all! We’ve got a partial derivative with respect to $t$ floating around, but I see no corresponding $dt$. So if we’re going to move to a four-dimensional spacetime and still use exterior derivatives, we can pick up $dt$ terms from the time derivative of $\beta$. But for the others to cancel off, they already need to have a $dt$ around in the first place. That is, we don’t actually have an electric $1$-form:

$\displaystyle\epsilon=\epsilon_xdx+\epsilon_ydy+\epsilon_zdz$

In truth we have an electric $2$-form:

$\displaystyle\epsilon=\epsilon_xdx\wedge dt+\epsilon_ydy\wedge dt+\epsilon_zdz\wedge dt$

Now, what does this mean for the exterior derivative $d\epsilon$?

\displaystyle\begin{aligned}d\epsilon=&\frac{\partial\epsilon_x}{\partial y}dy\wedge dx\wedge dt+\frac{\partial\epsilon_x}{\partial z}dz\wedge dx\wedge dt\\&+\frac{\partial\epsilon_y}{\partial x}dx\wedge dy\wedge dt+\frac{\partial\epsilon_y}{\partial z}dz\wedge dy\wedge dt\\&+\frac{\partial\epsilon_z}{\partial x}dx\wedge dz\wedge dt+\frac{\partial\epsilon_z}{\partial y}dy\wedge dz\wedge dt\\=&\left(\frac{\partial\epsilon_y}{\partial x}-\frac{\partial\epsilon_x}{\partial y}\right)dx\wedge dy\wedge dt\\&+\left(\frac{\partial\epsilon_x}{\partial z}-\frac{\partial\epsilon_z}{\partial x}\right)dz\wedge dx\wedge dt\\&+\left(\frac{\partial\epsilon_z}{\partial y}-\frac{\partial\epsilon_y}{\partial z}\right)dy\wedge dz\wedge dt\end{aligned}

Nothing has really changed, except now there’s an extra factor of $dt$ at the end of everything.

What happens to the exterior derivative of $\beta$ now that we’re using $t$ as another coordinate? Well, in components we write:

$\displaystyle\beta=\beta_xdy\wedge dz+\beta_ydz\wedge dx+\beta_zdx\wedge dy$

and thus we calculate:

\displaystyle\begin{aligned}d\beta=&\frac{\partial\beta_x}{\partial x}dx\wedge dy\wedge dz+\frac{\partial\beta_x}{\partial t}dt\wedge dy\wedge dz\\&+\frac{\partial\beta_y}{\partial y}dy\wedge dz\wedge dx+\frac{\partial\beta_y}{\partial t}dt\wedge dz\wedge dx\\&+\frac{\partial\beta_z}{\partial z}dz\wedge dx\wedge dy+\frac{\partial\beta_z}{\partial t}dt\wedge dx\wedge dy\\=&\left(\frac{\partial\beta_x}{\partial x}+\frac{\partial\beta_y}{\partial y}+\frac{\partial\beta_z}{\partial z}\right)dx\wedge dy\wedge dz\\&+\frac{\partial\beta_z}{\partial t}dx\wedge dy\wedge dt+\frac{\partial\beta_y}{\partial t}dz\wedge dx\wedge dt+\frac{\partial\beta_x}{\partial t}dy\wedge dz\wedge dt\end{aligned}

Now the first part of this is just the old, three-dimensional exterior derivative of $\beta$, corresponding to the divergence. The second of Maxwell’s equations says that it’s zero. And the other part of this is the time derivative of $\beta$, but with an extra factor of $dt$.

So let’s take the $2$-form $\epsilon$ and the $2$-form $\beta$ and put them together:

\displaystyle\begin{aligned}d(\epsilon+\beta)=&d\epsilon+d\beta\\=&\left(\frac{\partial\beta_x}{\partial x}+\frac{\partial\beta_y}{\partial y}+\frac{\partial\beta_z}{\partial z}\right)dx\wedge dy\wedge dz\\&+\left(\frac{\partial\epsilon_y}{\partial x}-\frac{\partial\epsilon_x}{\partial y}+\frac{\partial\beta_z}{\partial t}\right)dx\wedge dy\wedge dt\\&+\left(\frac{\partial\epsilon_x}{\partial z}-\frac{\partial\epsilon_z}{\partial x}+\frac{\partial\beta_y}{\partial t}\right)dz\wedge dx\wedge dt\\&+\left(\frac{\partial\epsilon_z}{\partial y}-\frac{\partial\epsilon_y}{\partial z}+\frac{\partial\beta_x}{\partial t}\right)dy\wedge dz\wedge dt\end{aligned}

The first term vanishes because of the second of Maxwell’s equations, and the rest all vanish because they’re the components of the third of Maxwell’s equations. That is, the second and third of Maxwell’s equations are both subsumed in this one four-dimensional equation.

When we rewrite the electric and magnetic fields as $2$-forms like this, their sum is called the “Faraday field” $F$. The second and third of Maxwell’s equations are equivalent to the single assertion that $dF=0$.

March 6, 2012

## The Meaning of the Speed of Light

Let’s pick up where we left off last time converting Maxwell’s equations into differential forms:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c^2\rho\\d\beta&=0\\d\epsilon&=-\frac{\partial\beta}{\partial t}\\{}*d*\beta&=\mu_0\iota+\frac{1}{c^2}\frac{\partial\epsilon}{\partial t}\end{aligned}

Now let’s notice that while the electric field has units of force per unit charge, the magnetic field has units of force per unit charge per unit velocity. Further, from our polarized plane-wave solutions to Maxwell’s equations, we see that for these waves the magnitude of the electric field is $c$ — a velocity — times the magnitude of the magnetic field. So let’s try collecting together factors of $c\beta$:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c^2\rho\\d(c\beta)&=0\\d\epsilon&=-\frac{1}{c}\frac{\partial(c\beta)}{\partial t}\\{}*d*(c\beta)&=\mu_0c\iota+\frac{1}{c}\frac{\partial\epsilon}{\partial t}\end{aligned}

Now each of the time derivatives comes along with a factor of $\frac{1}{c}$. We can absorb this by introducing a new variable $\tau=ct$, which is measured in units of distance rather than time. Then we can write:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c^2\rho\\d(c\beta)&=0\\d\epsilon&=-\frac{\partial(c\beta)}{\partial\tau}\\{}*d*(c\beta)&=\mu_0c\iota+\frac{\partial\epsilon}{\partial\tau}\end{aligned}

The easy thing here is to just write $t$ instead of $\tau$, but this hides a deep insight: the speed of light $c$ is acting like a conversion factor from units of time to units of distance. That is, we don’t just say that light moves at a speed of $c=299\,792\,457\frac{\mathrm{m}}{\mathrm{s}}$, we say that one second of time is 299,792,457 meters of distance. This is an incredibly identity that allows us to treat time and space on an equal footing, and it is borne out in many more or less direct experiments. I don’t want to get into all the consequences of this fact — the name for them as a collection is “special relativity” — but I do want to use it.

This lets us go back and write $\beta$ instead of $c\beta$, since the factor of $c$ here is just an artifact of using some coordinate system that treats time and distance separately; we see that the electric and magnetic fields in a propagating electromagnetic plane-wave are “really” the same size, and the factor of $c$ is just an artifact of our coordinate system. We can also just write $t$ instead of $c t$ for the same reason. Finally, we can collect $c\rho$ together to put it on the exact same footing as $\iota$.

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c\rho\\d\beta&=0\\d\epsilon&=-\frac{\partial\beta}{\partial t}\\{}*d*\beta&=\mu_0c\iota+\frac{\partial\epsilon}{\partial t}\end{aligned}

The meanings of these terms are getting further and further from familiarity. The $1$-form $\epsilon$ is still made of the same components as the electric field; the $2$-form $\beta$ is $c$ times the Hodge star of the $1$-form whose components are those of the magnetic field; the function $\rho$ is $c$ times the charge density; and the vector field $\iota$ is the current density.

February 24, 2012

## Maxwell’s Equations in Differential Forms

To this point, we’ve mostly followed a standard approach to classical electromagnetism, and nothing I’ve said should be all that new to a former physics major, although at some points we’ve infused more mathematical rigor than is typical. But now I want to go in a different direction.

Starting again with Maxwell’s equations, we see all these divergences and curls which, though familiar to many, are really heavy-duty equipment. In particular, they rely on the Riemannian structure on $\mathbb{R}^3$. We want to strip this away to find something that works without this assumption, and as a first step we’ll flip things over into differential forms.

So let’s say that the magnetic field $B$ corresponds to a $1$-form $\beta$, while the electric field $E$ corresponds to a $1$-form $\epsilon$. To avoid confusion between $\epsilon$ and the electric constant $\epsilon_0$, let’s also replace some of our constants with the speed of light — $\epsilon_0\mu_0=\frac{1}{c^2}$. At the same time, we’ll replace $J$ with a $1$-form $\iota$. Now Maxwell’s equations look like:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c^2\rho\\{}*d*\beta&=0\\{}*d\epsilon&=-\frac{\partial\beta}{\partial t}\\{}*d\beta&=\mu_0\iota+\frac{1}{c^2}\frac{\partial\epsilon}{\partial t}\end{aligned}

Now I want to juggle around some of these Hodge stars:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c^2\rho\\d(*\beta)&=0\\d\epsilon&=-\frac{\partial(*\beta)}{\partial t}\\{}*d*(*\beta)&=\mu_0\iota+\frac{1}{c^2}\frac{\partial\epsilon}{\partial t}\end{aligned}

Notice that we’re never just using the $1$-form $\beta$, but rather the $2$-form $*\beta$. Let’s actually go back and use $\beta$ to represent a $2$-form, so that $B$ corresponds to the $1$-form $*\beta$:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c^2\rho\\d\beta&=0\\d\epsilon&=-\frac{\partial\beta}{\partial t}\\{}*d*\beta&=\mu_0\iota+\frac{1}{c^2}\frac{\partial\epsilon}{\partial t}\end{aligned}

In the static case — where time derivatives are zero — we see how symmetric this new formulation is:

\displaystyle\begin{aligned}d\epsilon&=0\\d\beta&=0\\{}*d*\epsilon&=\mu_0c^2\rho\\{}*d*\beta&=\mu_0\iota\end{aligned}

For both the $1$-form $\epsilon$ and the $2$-form $\beta$, the exterior derivative vanishes, and the operator $*d*$ connects the fields to sources of physical charge and current.

February 22, 2012