# The Unapologetic Mathematician

## The Higgs Mechanism part 4: Symmetry Breaking

This is part four of a four-part discussion of the idea behind how the Higgs field does its thing. Read Part 1, Part 2, and Part 3 first.

At last we’re ready to explain the Higgs mechanism. We start where we left off last time: a complex scalar field $\phi$ with a gauged phase symmetry that brings in a (massless) gauge field $A_\mu$. The difference is that now we add a new self-interaction term to the Lagrangian:

$\displaystyle L=-\frac{1}{4}F_{\mu\nu}F_{\mu\nu}+(D_\mu\phi)^*D_\mu\phi-\left[-m^2\phi^*\phi+\lambda(\phi^*\phi)^2\right]$

where $\lambda$ is a constant that determines the strength of the self-interaction. We recall the gauged symmetry transformations:

\displaystyle\begin{aligned}\phi'(x)&=e^{i\alpha(x)}\phi(x)\\A_\mu'(x)&=A_\mu(x)+\frac{1}{e}\partial_\mu\alpha(x)\end{aligned}

If we write down an expression for the energy of a field configuration we get a bunch of derivative terms — basically like kinetic energy — that all occur with positive signs and then the potential energy term that comes in the brackets above:

$\displaystyle V(\phi^*\phi)=-m^2\phi^*\phi+\lambda(\phi^*\phi)^2$

Now, the “ground state” of the system should be one that minimizes the total energy, but the usual choice of setting all the fields equal to zero doesn’t do that here. The potential has a “bump” in the center, like the punt in the bottom of a wine bottle, or like a sombrero.

So instead of using that as our ground state, we’ll choose one. It doesn’t matter which, but it will be convenient to pick:

\displaystyle\begin{aligned}A_\mu^{(v)}&=0\\\phi^{(v)}=\frac{1}{\sqrt{2}}\phi_0\end{aligned}

where $\phi_0=\frac{m}{\sqrt{\lambda}}$ is chosen to minimize the potential. We can still use the same field $A_\mu$ as before, but now we will write

$\displaystyle\phi(x)=\frac{1}{\sqrt{2}}\left(\phi_0+\chi(x)+i\theta(x)\right)$

Since the ground state $\phi_0$ is a point along the real axis in the complex plane, vibrations in the field $\chi$ measure movement that changes the length of $\phi$, while vibrations in $\theta$ measure movement that changes the phase.

We want to consider the case where these vibrations are small — the field $\phi$ basically sticks near its ground state — because when they get big enough we have enough energy flying around in the system that we may as well just work in the more symmetric case anyway. So we are justified in only working out our new Lagrangian in terms up to quadratic order in the fields. This will also make our calculations a lot simpler. Indeed, to quadratic order (and ignoring an irrelevant additive constant) we have

$\displaystyle V(\phi^*\phi)=m^2\chi^2$

so vibrations of the $\theta$ field don’t show up at all in quadratic interactions.

We should also write out our covariant derivative up to linear terms:

$\displaystyle D_\mu\phi=\frac{1}{\sqrt{2}}\left(\partial_\mu\chi+i\partial_\mu\theta-ie\phi_uA_\mu\right)$

so that the quadratic Lagrangian is

\displaystyle\begin{aligned}L^{(2)}&=-\frac{1}{4}F_{\mu\nu}F_{\mu\nu}+\frac{1}{2}\lvert\partial_\mu\chi+i\partial_\mu\theta-ie\phi_uA_\mu\rvert^2-m^2\chi^2\\&=-\frac{1}{4}F_{\mu\nu}F_{\mu\nu}+\left[\frac{1}{2}\partial_\mu\chi\partial_\mu\chi-m^2\chi^2\right]+\frac{e^2\phi_0^2}{2}\left(A_\mu-\frac{1}{e\phi_0}\partial_\mu\theta\right)^2\end{aligned}

Now, the term in parentheses on the right looks like the mass term of a vector field $B_\mu$ with mass $e\phi_0$. But what is the kinetic term of this field?

\displaystyle\begin{aligned}B_{\mu\nu}&=\partial_\mu B_\nu-\partial_\nu B_\mu\\&=\partial_\mu\left(A_\nu-\frac{1}{e\phi_0}\partial_\nu\theta\right)-\partial_\nu\left(A_\mu-\frac{1}{e\phi_0}\partial_\mu\theta\right)\\&=\partial_\mu A_\nu-\partial_\nu A_\mu-\frac{1}{e\phi_0}\left(\partial_\mu\partial_\nu\theta-\partial_\nu\partial_\mu\theta\right)\\&=F_{\mu\nu}-0=F_{\mu\nu}\end{aligned}

And so we can write down the final form of our quadratic Lagrangian:

$\displaystyle L^{(2)}=\left[-\frac{1}{4}B_{\mu\nu}B_{\mu\nu}+\frac{e^2\phi_0^2}{2}B_\mu B_\mu\right]+\left[\frac{1}{2}\partial_\mu\chi\partial_\mu\chi-m^2\chi^2\right]$

In order to deal with the fact that our normal vacuum was not a minimum for the energy, we picked a new ground state that did minimize energy. But the new ground state doesn’t have the same symmetry the old one did — we have broken the symmetry — and when we write down the Lagrangian in terms of excitations around the new ground state, we find it convenient to change variables. The previously massless gauge field “eats” part of the scalar field and gains a mass, leaving behind the Higgs field.

This is essentially what’s going on in the Standard Model. The biggest difference is that instead of the initial symmetry being a simple phase, which just amounts to rotations around a circle, we have a (slightly) more complicated symmetry to deal with. For those that are familiar with some classical groups, we start with an action of $SU(2)\times U(1)$ on a column vector $\phi$ made of two complex scalar fields with a potential of the form:

$\displaystyle V(\phi)=\lambda\left(\phi^\dagger\phi-\frac{v^2}{2}\right)^2$

which is invariant under the obvious action of $SU(2)$ and a phase action of $U$. Since the group $SU(2)$ is three-dimensional there are three gauge fields to introduce for its symmetry and one more for the $U(1)$ symmetry.

When we pick a ground state that breaks the symmetry it doesn’t completely break; a one-dimensional subgroup $U(1)\subseteq SU(2)\times U(1)$ still leaves the new ground state invariant — though it’s important to notice that this is not just the $U(1)$ factor, but rather a mixture of this factor and a $U(1)$ subgroup of $SU(2)$. Thus only three of these gauge fields gain mass; they become the $W^\pm$ and $Z^0$ bosons that carry the weak force. The other gauge field remains massless, and becomes $\gamma$ — the photon.

At high enough energies — when the fields bounce around enough that the bump doesn’t really affect them — then the symmetry comes back and we see that the electromagnetic and weak interactions are really two different aspects of the same, unified phenomenon, just like electricity and magnetism are really two different aspects of electromagnetism.

July 19, 2012

## The Higgs Mechanism part 3: Gauge Symmetries

This is part three of a four-part discussion of the idea behind how the Higgs field does its thing. Read Part 1 and Part 2 first.

Now we’re starting to get to the really meaty stuff. We talked about the phase symmetry of the complex scalar field:

\displaystyle\begin{aligned}\phi'(x)&=e^{i\alpha}\phi(x)\\\phi'^*(x)&=e^{-i\alpha}\phi^*(x)\end{aligned}

which basically wants to express the idea that the physics of this field only really depends on the length of the complex field values $\phi(x)$ and not on their phases. But another big principle of physics is locality — what happens here doesn’t instantly affect what happens elsewhere — so why should the phase change be global?

To answer this, we “gauge” the symmetry and make it local. The origin of the term is fascinating, but takes us too far afield. The upshot is that we now have the symmetry transformation:

\displaystyle\begin{aligned}\phi'(x)&=e^{i\alpha(x)}\phi(x)\\\phi'^*(x)&=e^{-i\alpha(x)}\phi^*(x)\end{aligned}

where $\alpha$ is no longer a constant, but a function of the spacetime point $x$.

And here’s the big problem: since $\alpha$ varies from point to point, it now affects our derivative terms! Before we had

\displaystyle\begin{aligned}\partial_\mu\phi'(x)&=\partial_\mu\left(e^{i\alpha}\phi(x)\right)\\&=e^{i\alpha}\partial_\mu\phi(x)\end{aligned}

and similarly for $\phi^*$. We say that the derivatives are “covariant” under the transformation; they transform in the same way as the underlying fields. And this is what lets us say that

$\displaystyle\partial_\mu\phi'^*\partial_\mu\phi'=\partial_\mu\phi^*\partial_\mu\phi$

and makes the whole Lagrangian symmetric.

On the other hand, what do we see now?

\displaystyle\begin{aligned}\partial_\mu\phi'(x)&=\partial_\mu\left(e^{i\alpha(x)}\phi(x)\right)\\&=e^{i\alpha(x)}\partial_\mu\phi(x)+i\partial_\mu\alpha(x)e^{i\alpha(x)}\phi(x)\\&=e^{i\alpha(x)}\left[\partial_\mu\phi(x)+i\partial_\mu\alpha(x)\phi(x)\right]\end{aligned}

We pick up this extra term when we differentiate, and it ruins the symmetry.

The way out is to add another field that can “soak up” this extra term. Since the derivative is a vector, we introduce a vector field $A_\mu$ and say that it transforms as

$\displaystyle A_\mu'(x)=A_\mu(x)+\frac{1}{e}\partial_\mu\alpha(x)$

Next, we introduce a new derivative operator: $D_\mu=\partial_\mu-ieA_\mu$. That is:

$\displaystyle D_\mu\phi(x)=\partial_\mu\phi(x)-ieA_\mu(x)\phi(x)$

And we calculate

\displaystyle\begin{aligned}D_\mu\phi'(x)&=\partial_\mu\left(e^{i\alpha(x)}\phi(x)\right)-ieA_\mu'(x)e^{i\alpha(x)}\phi(x)\\&=e^{i\alpha(x)}\partial_\mu\phi(x)+i\partial_\mu\alpha(x)e^{i\alpha(x)}\phi(x)-ieA_\mu(x)e^{i\alpha(x)}\phi(x)-i\partial_\mu\alpha(x)e^{i\alpha(x)}\phi(x)\\&=e^{i\alpha(x)}\left[\partial_\mu\phi(x)-ieA_\mu(x)\phi(x)\right]\\&=e^{i\alpha(x)}D_\mu\phi(x)\end{aligned}

So the derivative $D_\mu\phi(x)$ does vary the same way as the underlying field $\phi(x)$ does! We call $D_\mu$ the “covariant derivative”. If we use it in our Lagrangian, we do recover our symmetry, though now we’ve got a new field $A_\mu$ to contend with. Just like the electromagnetic potential we use the derivative $F_{\mu\nu}=\partial_\mu A_\nu-\partial_\nu A_\mu$ to write

$\displaystyle L=-\frac{1}{4}F_{\mu\nu}F_{\mu\nu}+(D_\mu\phi)^*D_\mu\phi-m^2\phi^*\phi$

which is now symmetric under the gauged symmetry transformations.

It may not be apparent, but this Lagrangian does contain interaction terms. We can expand out the second term to find:

\displaystyle\begin{aligned}(D_\mu\phi)^*D_\mu\phi&=\left(\partial_\mu\phi^*-ieA_\mu\phi^*\right)\left(\partial_\mu\phi-ieA_\mu\phi\right)\\&=\partial_\mu\phi^*\partial_\mu\phi-ieA_\mu\partial_\mu\phi^*\phi-ieA_\mu\phi^*\partial_\mu\phi-e^2A_\mu A_\mu\phi^*\phi\end{aligned}

Our rules of thumb tell us that if we vary the Lagrangian with respect to $A_\mu$ we get the field equation

$\displaystyle\partial_\mu F_{\mu\nu}=ej_\mu$

which — if we expand out $F_{\mu\nu}$ as if it’s the Faraday field into “electric” and “magnetic” fields — give us Gauss’ and Ampère’s law in the presence of a charge-current density $j_\mu$.

The charge-current, in particular, we can write as

$\displaystyle j_\mu=-i\left(\phi^*\partial_\mu\phi-\partial_\mu\phi^*\phi\right)-2eA_\mu\phi^*\phi$

or, in a gauge-invariant manner, as

$\displaystyle j_\mu=-i\left[\phi^*D_\mu\phi-(D_\mu\phi)^*\phi\right]$

which is just the conserved current from last time with the regular derivatives replaced by covariant ones. Similarly, varying with respect to the field $\phi$ we find the “covariant” Klein-Gordon equation:

$\displaystyle D_\mu D_\mu\phi+m^2\phi=0$

and, when this holds, we can show that $\partial_\mu j_\mu=0$.

So we’ve found that if we take the global symmetry of the complex scalar field and “gauge” it, something like electromagnetism naturally pops out, and the particle of the complex scalar field interacts with it like charged particles interact with the real electromagnetic field.

July 18, 2012

## The Higgs Mechanism part 2: Examples of Lagrangian Field Equations

This is part two of a four-part discussion of the idea behind how the Higgs field does its thing. Read Part 1 first.

Okay, now that we’re sold on the Lagrangian formalism you can rest easy: I’m not going to go through the gory details of any more variational calculus. I do want to clear a couple notational things out of the way, though. They might not all matter for the purposes of our discussion, but better safe than sorry.

First off, I’m going to use a coordinate system where the speed of light is 1. That is, if my unit of time is seconds, my unit of distance is light-seconds. Mostly this helps keep annoying constants out of the way of the equations; physicists do this basically all the time. The other thing is that I’m going to work in four-dimensional spacetime, meaning we’ve got four coordinates: $x_0$, $x_1$, $x_2$, and $x_3$. We calculate dot products by writing $v\cdot w=v_1w_1+v_2w_2+v_3w_3-v_0w_0$. Yes, that minus sign is weird, but that’s just how spacetime works.

Also instead of writing spacetime vectors, I’m going to write down their components, indexed by a subscript that’s meant to run from 0 to 3. Usually this will be a Greek letter from the middle of the alphabet like $\mu$ or $\nu$. Similarly, instead of writing $\nabla$ for the vector composed of the four spacetime derivatives of a field I’ll just write down the derivatives, and I’ll write $\partial_\mu f$ instead of $\frac{\partial f}{\partial x_\mu}$.

Along with writing down components instead of vectors I won’t be writing dot products explicitly. Instead I’ll use the common convention that when the same index appears twice we’re supposed to sum over it, remembering that the zero component gets a minus sign. That is, $v_\mu w_\mu$ is the dot product from above. Similarly, we can multiply a matrix with entries $A_{\mu\nu}$ by a vector $v_\nu$ to get $w_\mu=A_{\mu\nu}v_\nu$; notice how the summed index $\nu$ gets “eaten up” in the process.

Okay, now even without going through the details there’s a fair bit we can infer from general rules of thumb. Any term in the Lagrangian that contains a derivative of the field we’re varying is almost always going to be the squared-length of that derivative, and the resulting term in the variational equations will be the negative of a second derivative of the field. For any term that involves the plain field we basically take its derivative as if the field were a variable. Any term that doesn’t involve the field at all just goes away. And since we prefer positive second-derivative terms to negative ones, we usually flip the sign of the resulting equation; since the other side is zero this doesn’t matter.

So if, for instance, we have the following Lagrangian of a complex scalar field $\phi$:

$\displaystyle L=\partial_\mu\phi^*\partial_\mu\phi-m^2\phi^*\phi$

we get two equations by varying the field $\phi$ and its complex conjugate $\phi^*$ separately:

\displaystyle\begin{aligned}\partial_\mu\partial_\mu\phi^*+m^2\phi^*&=0\\\partial_\mu\partial_\mu\phi+m^2\phi&=0\end{aligned}

It may not seem to make sense to vary the field and its complex conjugate separately, but the two equations we get at the end are basically the same anyway, so we’ll let this slide for now. Anyway, what we get is a second derivative of $\phi$ set equal to $m^2$ times $\phi$ itself, which we call the “Klein-Gordon wave equation” for $\phi$. Since the term $m^2\phi^*\phi$ gives rise to the term $m^2\phi$ in the field equations, we call this the “mass term”.

In the case of electromagnetism in a vacuum we just have the electromagnetic fields and no charge or current distribution. We use the Faraday field $F_{\mu\nu}=\partial_\mu A_\nu-\partial_\nu A_\mu$ to write down the Lagrangian

$\displaystyle L=-\frac{1}{4}F_{\mu\nu}F_{\mu\nu}$

which gives rise to the field equations

$\displaystyle\partial_\mu F_{\mu\nu}=0$

or, equivalently in terms of the potential field $A$:

\displaystyle\begin{aligned}\partial_\mu\partial_\mu A_\nu&=0\\\partial_\nu A_\nu&=0\end{aligned}

The second equation just expresses a choice we can make to always consider divergence-free potentials without affecting the predictions of electromagnetism; the first equation looks like the Klein-Gordon equation again, except there’s no mass term. Indeed, we know that photons — the particles associated to the electromagnetic field — have no rest mass!

Turning back to the complex scalar field, we notice that there’s a certain symmetry to this Lagrangian. Specifically, if we replace $\phi(x)$ and $\phi^*$ by

\displaystyle\begin{aligned}\phi'(x)&=e^{i\alpha}\phi(x)\\\phi'^*(x)&=e^{-i\alpha}\phi^*(x)\end{aligned}

for any constant $\alpha$, we get the same result. This is important, and it turns out to be a clue that leads us — I won’t go into the details — to consider the quantity

$\displaystyle j_\mu=-i(\phi^*\partial_\mu\phi-\phi\partial_\mu\phi^*)$

This is interesting because we can calculate

\displaystyle\begin{aligned}\partial_\mu j_\mu&=-i\partial_\mu(\phi^*\partial_\mu\phi-\phi\partial_\mu\phi^*)\\&=-i(\partial_\mu\phi^*\partial_\mu\phi+\phi^*\partial_\mu\partial_\mu\phi-\partial_\mu\phi\partial_\mu\phi^*-\phi\partial_\mu\partial_\mu\phi^*)\\&=-i(\phi^*\partial_\mu\partial_\mu\phi-\phi\partial_\mu\partial_\mu\phi^*)\\&=-i(-m^2\phi^*\phi+m^2\phi\phi^*)\\&=0\end{aligned}

where we’ve used the results of the Klein-Gordon equations. Since $\partial_\mu j_\mu=0$, this is a suitable vector field to use as a charge-current distribution; the equation just says that charge is conserved! That is, we can write down a Lagrangian involving both electromagnetism — that is, our “massless vector field” $A_\mu$ and our scalar field:

$\displaystyle L=-\frac{1}{4}F_{\mu\nu}F_{\mu\nu}-ej_\mu A_\mu$

where $e$ is a “coupling constant” that tells us how important the “interaction term” involving both $j_\mu$ and $A_\mu$ is. If it’s zero, then the fields don’t actually interact at all, but if it’s large then they affect each other very strongly.

July 17, 2012

## The Higgs Mechanism part 1: Lagrangians

This is part one of a four-part discussion of the idea behind how the Higgs field does its thing.

Wow, about six months’ hiatus as other parts of my life have taken precedence. But I drag myself slightly out of retirement to try to fill a big gap in the physics blogosphere: how the Higgs mechanism works.

There’s a lot of news about this nowadays, since the Large Hadron Collider has announced evidence of a “Higgs-like” particle. As a quick explanation of that, I use an analogy I made up on Twitter: “If Mirror-Spock exists, he has a goatee. We have found a man with a goatee. We do not yet know if he is Mirror-Spock.”

So, what is the Higgs boson? Well, it’s the particle expression of the Higgs field. That doesn’t explain anything, so we go one step further. What is the Higgs field? It’s the (conjectured) thing that gives some other particles (some of their) mass, in certain situations where normally we wouldn’t expect there to be any mass. And then there’s hand-waving about something like the ether that particles have to push through or shag carpet that they have to rub against that slows them down and hey, mass. Which doesn’t really explain anything, but sort of sounds like it might and so people nod sagely and then either forget about it all or spin their misconceptions into a new wave of Dancing Wu-Li Masters.

I think we can do better, at least for the science geeks out there who are actually interested and not allergic to a little math.

A couple warnings and comments before we begin. First off: I’m not going to go through this in my usual depth because I want to cram it into just three posts, albeit longer ones than usual, even though what I will say touches on all sorts of insanely cool mathematics that disappointingly few people see put together like this. Second: Ironically, that seems to include a lot of the physicists, who are generally more concerned with making predictions than with understanding how the underlying theory connects to everything else and it’s totally fine, honestly, that they’re interested in different aspects than I am. But I’m going to make a relatively superficial pass over describing the theory as physicists talk about it rather than go into those underlying structures. Lastly: I’m not going to describe the actual Higgs particle or field as they exist in the Standard Model; that would require quantum field theory and all sorts of messy stuff like that, when it turns out that the basic idea already shows up in classical field theory, which is a lot easier to explain. Even within classical field theory I’m going to restrict myself to a simpler example of the sort of thing that happens. Because reasons.

That all said, let’s dive in with Lagrangian mechanics. This is a subject that you probably never heard about unless you were a physics major or maybe a math major. Basically, Newtonian mechanics works off of the three laws that were probably drilled into your head by the end of high school science classes:

Newton’s Laws of Motion

1. An object at rest tends to stay at rest; an object in motion tends to stay in that motion.
2. Force applied to an object is proportional to the acceleration that object experiences. The constant of proportionality is the object’s mass.
3. Every action comes paired with an equal and opposite reaction.

It’s the second one that gets the most use since we can write it down in a formula: $F=ma$. And for most forces we’re interested in the force is a conservative vector field, meaning that it’s the (negative) gradient (fancy word for “derivative” that comes up in more than one dimension) of a potential energy function: $F=-\nabla U$. What this means is that things like to move in the direction that potential energy decreases, and they “feel a force” pushing them in that direction. Upshot for Newton: $ma=-\nabla U$.

Lagrangian mechanics comes at this same formula with a different explanation: objects like to move along paths that (locally) minimize some quantity called “action”. This principle unifies the usual topics of high school Newtonian physics with things like optics where we say that light likes to move along the shortest path between two points. Indeed, the “action” for light rays is just the distance they travel! This also explains things like “the angle of incidence equals the angle of reflection”; if you look at all paths between two points that bounce off of a mirror, the one that satisfies this property has the shortest length, making it a local minimum for the action.

Let’s set this up for a body moving around in some potential field to show you how it works. The action of a suggested path $q(t)$ — the body is at the point $q(t)$ at time $t$ over a time interval $t_1\leq t\leq t_2$ is:

$\displaystyle S[q]=\int\limits_{t_1}^{t_2}\frac{1}{2}mv(t)^2-U(q(t))\,dt$

where $v(t)=\dot{q}(t)$ is the velocity vector of the particle, $v(t)^2$ is the square of its length, and $U(x)$ is a potential function depending only on the position of the particle. Don’t worry: there’s a big scary integral here, but we aren’t going to actually do any integration.

The function on the inside of the integral is called the Lagrangian function, and we calculate the action $S$ of the path $q$ by integrating the Langrangian over the time interval we’re concerned with. We write this as $S[q]$ with square brackets to emphasize that this is a “functional” that takes a function $q$ and gives a number back. Of course, as mathematicians there’s really nothing inherently special about functions taking functions as arguments, but for beginners it helps keep things straight.

Now, what happens if we “wiggle” the path a bit? What if we calculate the action of $q'=q+\delta q$, where $\delta q$ is some “small” function called the “variation” of $q$? We calculate:

$\displaystyle S[q']=\int\limits_{t_1}^{t_2}\frac{1}{2}m(\dot{q}'(t))^2-U(q'(t))\,dt$

Taking the derivative $\dot{q}'$ is linear, so we see that $\dot{q}'=\dot{q}+\delta\dot{q}$; “the variation of the derivative is the derivative of the variation”. Plugging this in:

\displaystyle\begin{aligned}S[q']&=\int\limits_{t_1}^{t_2}\frac{1}{2}m(\dot{q}(t)+\delta\dot{q}(t))^2-U(q(t)+\delta q(t))\,dt\\&=\int\limits_{t_1}^{t_2}\frac{1}{2}m(\dot{q}(t)^2+2\dot{q}(t)\cdot\delta\dot{q}(t)+\delta\dot{q}(t)^2)-U(q(t)+\delta q(t))\,dt\\&\approx\int\limits_{t_1}^{t_2}\frac{1}{2}m(\dot{q}(t)^2+2\dot{q}(t)\cdot\delta\dot{q}(t))-\left[U(q(t))+\nabla U(q(t))\cdot\delta q(t)\right]\,dt\end{aligned}

where we’ve thrown away terms involving second and higher powers of $\delta q$; the variation is small, so the square (and cube, and …) is negligible. So what’s the difference between this and $S[q]$? What’s the variation of the action?

$\displaystyle\delta S=S[q']-S[q]=\int\limits_{t_1}^{t_2}m\dot{q}(t)\cdot\delta\dot{q}(t)-\nabla U(q(t))\cdot\delta q(t)\,dt$

where again we throw away negligible terms. Now we can handle the first term here using integration by parts:

\displaystyle\begin{aligned}\delta S=S[q']-S[q]&=\int\limits_{t_1}^{t_2}-m\ddot{q}(t)\cdot\delta q(t)-\nabla U(q(t))\cdot\delta q(t)\,dt\\&=\int\limits_{t_1}^{t_2}-\left[m\ddot{q}(t)+\nabla U(q(t))\right]\cdot\delta q(t)\,dt\end{aligned}

“Wait a minute!” those of you paying attention will cry out, “what about the boundary terms!?” Indeed, when we use integration by parts we should pick up $\ddot{q}(t_2)\cdot\delta q(t_2)-\ddot{q}(t_1)\cdot\delta q(t_1)$, but we will assume that we know where the body is at the beginning and the end of our time interval, and we’re just trying to figure out how it gets from one point to the other. That is, $\delta q$ is zero at both endpoints.

So, now we apply our Lagrangian principle: bodies like to move along action-minimizing paths. We know how action changes if we “wiggle” the path by a little variation $\delta q$, and this should remind us about how to find local minima: they happen when no matter how we change the input, the “first derivative” of the output is zero. Here the first derivative is the variation in the action, throwing away the negligible terms. So, what condition will make $\delta S$ zero no matter what function we put in for $\delta q$? Well, the other term in the integrand will have to vanish:

$\displaystyle m\ddot{q}(t)+\nabla U(q(t))=0$

But this is just Newton’s second law from above, coming back again!

Everything we know from Newtonian mechanics can be written down in Lagrangian mechanics by coming up with a suitable action functional, which usually takes the form of an integral of an appropriate Lagrangian function. But lots more things can be described using the Lagrangian formalism, including field theories like electromagnetism.

In the presence of a charge distribution $\rho$ and a current distribution $j$, we take the potentials $\phi$ and $A$ as fundamental and start with the action (suppressing the space and time arguments so we can write $\rho$ instead of $\rho(x,t)$:

$\displaystyle S[\phi,A]=\int_{t_1}^{t_2}\int_{\mathbb{R}^3}-\rho\phi+j\cdot A+\frac{\epsilon_0}{2}E^2-\frac{1}{2\mu_0}B^2\,dV\,dt$

When we vary with respect to $\phi$ and insist that the variance of $S$ be zero we get Gauss’ law:

$\displaystyle\nabla\cdot E=\frac{\rho}{\epsilon_0}$

Varying the components of $A$ we get Ampère’s law with Maxwell’s correction:

$\displaystyle\nabla\times B=\mu_0j+\epsilon_0\mu_0\frac{\partial E}{\partial t}$

The other two of Maxwell’s equations come automatically from taking the potentials as fundamental and coming up with the electric and magnetic fields from them.

July 16, 2012

## A Continued Rant on Electromagnetism Texts and the Pedagogy of Science

A comment just came in on my short rant about electromagnetism texts. Dripping with condescension, it states:

Here’s the fundamental reason for your discomfort: as a mathematician, you don’t realize that scalar and vector potentials have *no physical significance* (or for that matter, do you understand the distinction between objects of physical significance and things that are merely convenient mathematical devices?).

It really doesn’t matter how scalar and vector potentials are defined, found, or justified, so long as they make it convenient for you to work with electric and magnetic fields, which *are* physical (after all, if potentials were physical, gauge freedom would make no sense).

On rare occasions (e.g. Aharonov-Bohm effect), there’s the illusion that (vector) potential has actual physical significance, but when you realize it’s only the *differences* in the potential, it ought to become obvious that, once again, potentials are just mathematically convenient devices to do what you can do with fields alone.

P.S. We physicists are very happy with merely achieving self-consistency, thankyouverymuch. Experiments will provide the remaining justification.

The thing is, none of that changes the fact that you’re flat-out lying to students when you say that the vanishing divergence of the magnetic field, on its own, implies the existence of a vector potential.

I think the commenter is confusing my complaint with a different, more common one: the fact that potentials are not uniquely defined as functions. But I actually don’t have a problem with that, since the same is true of any antiderivative. After all, what is an antiderivative but a potential function in a one-dimensional space? In fact, the concepts of torsors and gauge symmetries are intimately connected with this indefiniteness.

No, my complaint is that physicists are sloppy in their teaching, which they sweep under the carpet of agreement with certain experiments. It’s trivial to cook up magnetic fields in non-simply-connected spaces which satisfy Maxwell’s equations and yet have no globally-defined potential at all. It’s not just that a potential is only defined up to an additive constant; it’s that when you go around certain loops the value of the potential must have changed, and so at no point can the function take any “self-consistent” value.

In being so sloppy, physicists commit the sin of making unstated assumptions, and in doing so in front of kids who are too naïve to know better. A professor may know that this is only true in spaces without holes, but his students probably don’t, and they won’t until they rely on the assumption in a case where it doesn’t hold. That’s really all I’m saying: state your assumptions; unstated assumptions are anathema to science.

As for the physical significance of potentials, I won’t even bother delving into the fact that explaining Aharonov-Bohm with fields alone entails chucking locality right out the window. Rest assured that once you move on from classical electromagnetism to quantum electrodynamics and other quantum field theories, the potential is clearly physically significant.

March 8, 2012

## Minkowski Space

Before we push ahead with the Faraday field in hand, we need to properly define the Hodge star in our four-dimensional space, and we need a pseudo-Riemannian metric to do this. Before we were just using the standard $\mathbb{R}^3$, but now that we’re lumping in time we need to choose a four-dimensional metric.

And just to screw with you, it will have a different signature. If we have vectors $v_1=(x_1,y_1,z_1,t_1)$ and $v_2=(x_2,y_2,z_2,t_2)$ — with time here measured in the same units as space by using the speed of light as a conversion factor — then we calculate the metric as:

$\displaystyle g(v_1,v_2)=x_1x_2+y_1y_2+z_1z_2-t_1t_2$

In particular, if we stick the vector $v=(x,y,z,t)$ into the metric twice, like we do to calculate a squared-length when working with an inner product, we find:

$\displaystyle g(v,v)=x^2+y^2+z^2-t^2$

This looks like the Pythagorean theorem in two or three dimensions, but when we get to the time dimension we subtract $t^2$ instead of adding them! Four-dimensional real space equipped with a metric of this form is called “Minkowski space”. More specifically, it’s called 4-dimensional Minkowski space, or “(3+1)-dimensional” Minkowski space — three spatial dimensions and one temporal dimension. Higher-dimensional versions with $n-1$ “spatial” dimensions (with plusses in the metric) and one “temporal” dimension (with minuses) are also called Minkowski space. And, perversely enough, some physicists write it all backwards with one plus and $n-1$ minuses; this version is useful if you think of displacements in time as more fundamental — and thus more useful to call “positive” — than displacements in space.

What implications does this have on the coordinate expression of the Hodge star? It’s pretty much the same, except for the determinant part. You can think about it yourself, but the upshot is that we pick up an extra factor of $-1$ when the basic form going into the star involves $dt$.

So the rule is that for a basic form $\alpha$, the dual form $*\alpha$ consists of those component $1$-forms not involved in $\alpha$, ordered such that $\alpha\wedge(*\alpha)=\pm dx\wedge dy\wedge dz\wedge dt$, with a negative sign if and only if $dt$ is involved in $\alpha$. Let’s write it all out for easy reference:

\displaystyle\begin{aligned}*1&=dx\wedge dy\wedge dz\wedge dt\\ *dx&=dy\wedge dz\wedge dt\\ *dy&=dz\wedge dx\wedge dt\\ *dz&=dx\wedge dy\wedge dt\\ *dt&=dx\wedge dy\wedge dz\\ *(dx\wedge dy)&=dz\wedge dt\\ *(dz\wedge dx)&=dy\wedge dt\\ *(dy\wedge dz)&=dx\wedge dt\\ *(dx\wedge dt)&=-dy\wedge dz\\ *(dy\wedge dt)&=-dz\wedge dx\\ *(dz\wedge dt)&=-dx\wedge dy\\ *(dx\wedge dy\wedge dz)&=dt\\ *(dx\wedge dy\wedge dt)&=dz\\ *(dz\wedge dx\wedge dt)&=dy\\ *(dy\wedge dz\wedge dt)&=dx\\ *(dx\wedge dy\wedge dz\wedge dt)&=-1\end{aligned}

Note that the square of the Hodge star has the opposite sign from the Riemannian case; when $k$ is odd the double Hodge dual of a $k$-form is the original form back again, but when $k$ is even the double dual is the negative of the original form.

March 7, 2012

Now that we’ve seen that we can use the speed of light as a conversion factor to put time and space measurements on an equal footing, let’s actually do it to Maxwell’s equations. We start by moving the time derivatives over on the same side as all the space derivatives:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c\rho\\d\beta&=0\\d\epsilon+\frac{\partial\beta}{\partial t}&=0\\{}*d*\beta-\frac{\partial\epsilon}{\partial t}&=\mu_0c\iota\end{aligned}

The exterior derivatives here written as $d$ comprise the derivatives in all the spatial directions. If we pick coordinates $x$, $y$, and $z$, then we can write the third equation as three component equations that each look something like

$\displaystyle\frac{\partial\epsilon_x}{\partial y}dy\wedge dx+\frac{\partial\epsilon_y}{\partial x}dx\wedge dy+\frac{\partial\beta_x}{\partial t}dx\wedge dy=\left(\frac{\partial\epsilon_y}{\partial x}-\frac{\partial\epsilon_x}{\partial y}+\frac{\partial\beta_z}{\partial t}\right)dx\wedge dy=0$

This doesn’t look right at all! We’ve got a partial derivative with respect to $t$ floating around, but I see no corresponding $dt$. So if we’re going to move to a four-dimensional spacetime and still use exterior derivatives, we can pick up $dt$ terms from the time derivative of $\beta$. But for the others to cancel off, they already need to have a $dt$ around in the first place. That is, we don’t actually have an electric $1$-form:

$\displaystyle\epsilon=\epsilon_xdx+\epsilon_ydy+\epsilon_zdz$

In truth we have an electric $2$-form:

$\displaystyle\epsilon=\epsilon_xdx\wedge dt+\epsilon_ydy\wedge dt+\epsilon_zdz\wedge dt$

Now, what does this mean for the exterior derivative $d\epsilon$?

\displaystyle\begin{aligned}d\epsilon=&\frac{\partial\epsilon_x}{\partial y}dy\wedge dx\wedge dt+\frac{\partial\epsilon_x}{\partial z}dz\wedge dx\wedge dt\\&+\frac{\partial\epsilon_y}{\partial x}dx\wedge dy\wedge dt+\frac{\partial\epsilon_y}{\partial z}dz\wedge dy\wedge dt\\&+\frac{\partial\epsilon_z}{\partial x}dx\wedge dz\wedge dt+\frac{\partial\epsilon_z}{\partial y}dy\wedge dz\wedge dt\\=&\left(\frac{\partial\epsilon_y}{\partial x}-\frac{\partial\epsilon_x}{\partial y}\right)dx\wedge dy\wedge dt\\&+\left(\frac{\partial\epsilon_x}{\partial z}-\frac{\partial\epsilon_z}{\partial x}\right)dz\wedge dx\wedge dt\\&+\left(\frac{\partial\epsilon_z}{\partial y}-\frac{\partial\epsilon_y}{\partial z}\right)dy\wedge dz\wedge dt\end{aligned}

Nothing has really changed, except now there’s an extra factor of $dt$ at the end of everything.

What happens to the exterior derivative of $\beta$ now that we’re using $t$ as another coordinate? Well, in components we write:

$\displaystyle\beta=\beta_xdy\wedge dz+\beta_ydz\wedge dx+\beta_zdx\wedge dy$

and thus we calculate:

\displaystyle\begin{aligned}d\beta=&\frac{\partial\beta_x}{\partial x}dx\wedge dy\wedge dz+\frac{\partial\beta_x}{\partial t}dt\wedge dy\wedge dz\\&+\frac{\partial\beta_y}{\partial y}dy\wedge dz\wedge dx+\frac{\partial\beta_y}{\partial t}dt\wedge dz\wedge dx\\&+\frac{\partial\beta_z}{\partial z}dz\wedge dx\wedge dy+\frac{\partial\beta_z}{\partial t}dt\wedge dx\wedge dy\\=&\left(\frac{\partial\beta_x}{\partial x}+\frac{\partial\beta_y}{\partial y}+\frac{\partial\beta_z}{\partial z}\right)dx\wedge dy\wedge dz\\&+\frac{\partial\beta_z}{\partial t}dx\wedge dy\wedge dt+\frac{\partial\beta_y}{\partial t}dz\wedge dx\wedge dt+\frac{\partial\beta_x}{\partial t}dy\wedge dz\wedge dt\end{aligned}

Now the first part of this is just the old, three-dimensional exterior derivative of $\beta$, corresponding to the divergence. The second of Maxwell’s equations says that it’s zero. And the other part of this is the time derivative of $\beta$, but with an extra factor of $dt$.

So let’s take the $2$-form $\epsilon$ and the $2$-form $\beta$ and put them together:

\displaystyle\begin{aligned}d(\epsilon+\beta)=&d\epsilon+d\beta\\=&\left(\frac{\partial\beta_x}{\partial x}+\frac{\partial\beta_y}{\partial y}+\frac{\partial\beta_z}{\partial z}\right)dx\wedge dy\wedge dz\\&+\left(\frac{\partial\epsilon_y}{\partial x}-\frac{\partial\epsilon_x}{\partial y}+\frac{\partial\beta_z}{\partial t}\right)dx\wedge dy\wedge dt\\&+\left(\frac{\partial\epsilon_x}{\partial z}-\frac{\partial\epsilon_z}{\partial x}+\frac{\partial\beta_y}{\partial t}\right)dz\wedge dx\wedge dt\\&+\left(\frac{\partial\epsilon_z}{\partial y}-\frac{\partial\epsilon_y}{\partial z}+\frac{\partial\beta_x}{\partial t}\right)dy\wedge dz\wedge dt\end{aligned}

The first term vanishes because of the second of Maxwell’s equations, and the rest all vanish because they’re the components of the third of Maxwell’s equations. That is, the second and third of Maxwell’s equations are both subsumed in this one four-dimensional equation.

When we rewrite the electric and magnetic fields as $2$-forms like this, their sum is called the “Faraday field” $F$. The second and third of Maxwell’s equations are equivalent to the single assertion that $dF=0$.

March 6, 2012

## The Meaning of the Speed of Light

Let’s pick up where we left off last time converting Maxwell’s equations into differential forms:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c^2\rho\\d\beta&=0\\d\epsilon&=-\frac{\partial\beta}{\partial t}\\{}*d*\beta&=\mu_0\iota+\frac{1}{c^2}\frac{\partial\epsilon}{\partial t}\end{aligned}

Now let’s notice that while the electric field has units of force per unit charge, the magnetic field has units of force per unit charge per unit velocity. Further, from our polarized plane-wave solutions to Maxwell’s equations, we see that for these waves the magnitude of the electric field is $c$ — a velocity — times the magnitude of the magnetic field. So let’s try collecting together factors of $c\beta$:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c^2\rho\\d(c\beta)&=0\\d\epsilon&=-\frac{1}{c}\frac{\partial(c\beta)}{\partial t}\\{}*d*(c\beta)&=\mu_0c\iota+\frac{1}{c}\frac{\partial\epsilon}{\partial t}\end{aligned}

Now each of the time derivatives comes along with a factor of $\frac{1}{c}$. We can absorb this by introducing a new variable $\tau=ct$, which is measured in units of distance rather than time. Then we can write:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c^2\rho\\d(c\beta)&=0\\d\epsilon&=-\frac{\partial(c\beta)}{\partial\tau}\\{}*d*(c\beta)&=\mu_0c\iota+\frac{\partial\epsilon}{\partial\tau}\end{aligned}

The easy thing here is to just write $t$ instead of $\tau$, but this hides a deep insight: the speed of light $c$ is acting like a conversion factor from units of time to units of distance. That is, we don’t just say that light moves at a speed of $c=299\,792\,457\frac{\mathrm{m}}{\mathrm{s}}$, we say that one second of time is 299,792,457 meters of distance. This is an incredibly identity that allows us to treat time and space on an equal footing, and it is borne out in many more or less direct experiments. I don’t want to get into all the consequences of this fact — the name for them as a collection is “special relativity” — but I do want to use it.

This lets us go back and write $\beta$ instead of $c\beta$, since the factor of $c$ here is just an artifact of using some coordinate system that treats time and distance separately; we see that the electric and magnetic fields in a propagating electromagnetic plane-wave are “really” the same size, and the factor of $c$ is just an artifact of our coordinate system. We can also just write $t$ instead of $c t$ for the same reason. Finally, we can collect $c\rho$ together to put it on the exact same footing as $\iota$.

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c\rho\\d\beta&=0\\d\epsilon&=-\frac{\partial\beta}{\partial t}\\{}*d*\beta&=\mu_0c\iota+\frac{\partial\epsilon}{\partial t}\end{aligned}

The meanings of these terms are getting further and further from familiarity. The $1$-form $\epsilon$ is still made of the same components as the electric field; the $2$-form $\beta$ is $c$ times the Hodge star of the $1$-form whose components are those of the magnetic field; the function $\rho$ is $c$ times the charge density; and the vector field $\iota$ is the current density.

February 24, 2012

## Maxwell’s Equations in Differential Forms

To this point, we’ve mostly followed a standard approach to classical electromagnetism, and nothing I’ve said should be all that new to a former physics major, although at some points we’ve infused more mathematical rigor than is typical. But now I want to go in a different direction.

Starting again with Maxwell’s equations, we see all these divergences and curls which, though familiar to many, are really heavy-duty equipment. In particular, they rely on the Riemannian structure on $\mathbb{R}^3$. We want to strip this away to find something that works without this assumption, and as a first step we’ll flip things over into differential forms.

So let’s say that the magnetic field $B$ corresponds to a $1$-form $\beta$, while the electric field $E$ corresponds to a $1$-form $\epsilon$. To avoid confusion between $\epsilon$ and the electric constant $\epsilon_0$, let’s also replace some of our constants with the speed of light — $\epsilon_0\mu_0=\frac{1}{c^2}$. At the same time, we’ll replace $J$ with a $1$-form $\iota$. Now Maxwell’s equations look like:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c^2\rho\\{}*d*\beta&=0\\{}*d\epsilon&=-\frac{\partial\beta}{\partial t}\\{}*d\beta&=\mu_0\iota+\frac{1}{c^2}\frac{\partial\epsilon}{\partial t}\end{aligned}

Now I want to juggle around some of these Hodge stars:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c^2\rho\\d(*\beta)&=0\\d\epsilon&=-\frac{\partial(*\beta)}{\partial t}\\{}*d*(*\beta)&=\mu_0\iota+\frac{1}{c^2}\frac{\partial\epsilon}{\partial t}\end{aligned}

Notice that we’re never just using the $1$-form $\beta$, but rather the $2$-form $*\beta$. Let’s actually go back and use $\beta$ to represent a $2$-form, so that $B$ corresponds to the $1$-form $*\beta$:

\displaystyle\begin{aligned}*d*\epsilon&=\mu_0c^2\rho\\d\beta&=0\\d\epsilon&=-\frac{\partial\beta}{\partial t}\\{}*d*\beta&=\mu_0\iota+\frac{1}{c^2}\frac{\partial\epsilon}{\partial t}\end{aligned}

In the static case — where time derivatives are zero — we see how symmetric this new formulation is:

\displaystyle\begin{aligned}d\epsilon&=0\\d\beta&=0\\{}*d*\epsilon&=\mu_0c^2\rho\\{}*d*\beta&=\mu_0\iota\end{aligned}

For both the $1$-form $\epsilon$ and the $2$-form $\beta$, the exterior derivative vanishes, and the operator $*d*$ connects the fields to sources of physical charge and current.

February 22, 2012

## A Short Rant about Electromagnetism Texts

I’d like to step aside from the main line to make one complaint. In refreshing my background in classical electromagnetism for this series I’ve run into something that bugs the hell out of me as a mathematician. I remember it from my own first course, but I’m shocked to see that it survives into every upper-level treatment I’ve seen.

It’s about the existence of potentials, and the argument usually goes like this: as Faraday’s law tells us, for a static electric field we have $\nabla\times E=0$; therefore $E=\nabla\phi$ for some potential function $\phi$ because the curl of a gradient is zero.

What?

Let’s break this down to simple formal logic that any physics undergrad can follow. Let $P$ be the statement that there exists a $\phi$ such that $E=\nabla\phi$. Let $Q$ be the statement that $\nabla\times E=0$. The curl of a gradient being zero is the implication $P\implies Q$. So here’s the logic:

\displaystyle\begin{aligned}&Q\\&P\implies Q\\&\therefore P\end{aligned}

and that doesn’t make sense at all. It’s a textbook case of “affirming the consequent”.

Saying that $E$ has a potential function is a nice, convenient way of satisfying the condition that its curl should vanish, but this argument gives no rationale for believing it’s the only option.

If we flip over to the language of differential forms, we know that the curl operator on a vector field corresponds to the operator $\alpha\mapsto*d\alpha$ on $1$-forms, while the gradient operator corresponds to $f\mapsto df$. We indeed know that $*ddf=0$ automatically — the curl of a gradient vanishes — but knowing that $d\alpha=0$ is not enough to conclude that $\alpha=df$ for some $f$. In fact, this question is exactly what de Rham cohomology is all about!

So what’s missing? Full formality demands that we justify that the first de Rham cohomology of our space vanish. Now, I’m not suggesting that we make physics undergrads learn about homology — it might not be a terrible idea, though — but we can satisfy this in the context of a course just by admitting that we are (a) being a little sloppy here, and (b) the justification is that (for our purposes) the electric field $E$ is defined in some simply-connected region of space which has no “holes” one could wrap a path around. In fact, if the students have had a decent course in multivariable calculus they’ve probably seen the explicit construction of a potential function for a vector field whose curl vanishes subject to the restriction that we’re working over a simply-connected space.

The problem arises again in justifying the existence of a vector potential: as Gauss’ law for magnetism tells us, for a magnetic field we have $\nabla\cdot B=0$; therefore $B=\nabla\times A$ for some vector potential $A$ because the divergence of a curl is zero.

Again we see the same problem of affirming the consequent. And again the real problem hinges on the unspoken assumption that the second de Rham cohomology of our space vanishes. Yes, this is true for contractible spaces, but we must make mention of the fact that our space is contractible! In fact, I did exactly that when I needed to get ahold of the magnetic potential once.

Again: we don’t need to stop simplifying and sweeping some of these messier details of our arguments under the rug when dealing with undergraduate students, but we do need to be honest that those details were there to be swept in the first place. The alternative most texts and notes choose now is to include statements which are blatantly false, and to rely on our authority to make students accept them unquestioningly.

February 18, 2012