The Unapologetic Mathematician

Mathematics for the interested outsider

Plane Waves

We’ve derived a “wave equation” from Maxwell’s equations, but it’s not clear what it means, or even why this is called a wave equation. Let’s consider the abstracted form, which both electric and magnetic fields satisfy:

\displaystyle\frac{\partial^2F}{\partial t^2}-c^2\nabla^2F=0

where \nabla^2 is the “Laplacian” operator, defined on scalar functions by taking the gradient followed by the divergence, and extended linearly to vector fields. If we have a Cartesian coordinate system — and remember we’re working in good, old \mathbb{R}^3 so it’s possible to pick just such coordinates, albeit not canonically — we can write

\displaystyle\frac{\partial^2F_x}{\partial t^2}-c^2\nabla^2F_x=0

where F_x is the x-component of F, and a similar equation holds for the y and z components as well. We can also write out the Laplacian in terms of coordinate derivatives:

\displaystyle\frac{\partial^2f}{\partial t^2}-c^2\left(\frac{\partial^2f}{\partial x^2}+\frac{\partial^2f}{\partial y^2}+\frac{\partial^2f}{\partial z^2}\right)=0

Let’s simplify further to just consider functions that depend on x and t, and which are constant in the y and z directions:

\displaystyle\frac{\partial^2f}{\partial t^2}-c^2\frac{\partial^2f}{\partial x^2}=\left[\frac{\partial^2}{\partial t^2}-c^2\frac{\partial^2}{\partial x^2}\right]f=0

We can take this big operator and “factor” it:

\displaystyle\left[\left(\frac{\partial}{\partial t}+c\frac{\partial}{\partial x}\right)\left(\frac{\partial}{\partial t}-c\frac{\partial}{\partial x}\right)\right]f=0

Any function which either “factor” sends to zero will be a solution of the whole equation. We find solutions like

\displaystyle\begin{aligned}\left[\frac{\partial}{\partial t}+c\frac{\partial}{\partial x}\right]A(x-ct)&=A'(x-ct)\frac{\partial(x-ct)}{\partial t}+cA'(x-ct)\frac{\partial(x-ct)}{\partial x}\\&=A'(x-ct)(-c+c)=0\\\left[\frac{\partial}{\partial t}-c\frac{\partial}{\partial x}\right]B(x+ct)&=B'(x+ct)\frac{\partial(x+ct)}{\partial t}-cB'(x+ct)\frac{\partial(x+ct)}{\partial x}\\&=B'(x+ct)(c-c)=0\end{aligned}

where A and B are pretty much any function that’s at least mildly well-behaved.

We call solutions of the first form “right-moving”, for if we view t as time and watch as it increases, the “shape” of A(x-ct) stays the same; it just moves in the increasing x direction. That is, at time t_0+\Delta t we see the same thing at x that we saw at x-c\Delta tc\Delta t units to the left — at time t_0. Similarly, we call solutions of the second form “left-moving”. In each family, solutions propagate at a rate of c, which was the constant from our original equation. Any solution of this simplified, one-dimensional wave equation will be the sum of a right-moving and a left-moving term.

More generally, for the three-dimensional version we have “plane-wave” solutions propagating in any given direction we want. We could do a big, messy calculation, but note that if k is any unit vector, we can pick a Cartesian coordinate system where k is the unit vector in the x direction, in which case we’re back to the right-moving solutions from above. And of course there’s no reason we can’t let A be a vector-valued function. Such a solution looks like

\displaystyle A(r,t)=\hat{A}(k\cdot r-ct)

The bigger t is, the further in the k direction the position vector r must extend to compensate; the shape \hat{A} stays the same, but moves in the direction of k with a velocity of c.

It will be helpful to work out some of the basic derivatives of such solutions. Time is easy:

\displaystyle\begin{aligned}\frac{\partial}{\partial t}A(r,t)&=\frac{\partial}{\partial t}\hat{A}(k\cdot r-ct)\\&=\hat{A}'(k\cdot r-ct)\frac{\partial}{\partial t}(k\cdot r-ct)\\&=-c\hat{A}'(k\cdot r-ct)\end{aligned}

Spatial derivatives are a little trickier. We pick a Cartesian coordinate system to write:

\displaystyle\begin{aligned}\frac{\partial}{\partial x}A(r,t)&=\frac{\partial}{\partial x}\hat{A}(k\cdot r-ct)\\&=\hat{A}'(k\cdot r-ct)\frac{\partial}{\partial x}(k\cdot r-ct)\\&=k_x\hat{A}'(k\cdot r-ct)\\\frac{\partial}{\partial y}A(r,t)&=k_y\hat{A}'(k\cdot r-ct)\\\frac{\partial}{\partial z}A(r,t)&=k_z\hat{A}'(k\cdot r-ct)\end{aligned}

We don’t really want to depend on coordinates, so luckily it’s easy enough to figure out:

\displaystyle\begin{aligned}\nabla\cdot A(r,t)&=k\cdot\hat{A}'(k\cdot r-ct)\\\nabla\times A(r,t)&=k\times\hat{A}'(k\cdot r-ct)\end{aligned}

which will make our lives much easier to have worked out in advance.

February 8, 2012 Posted by | Analysis, Differential Equations | 3 Comments

Smooth Dependence on Initial Conditions

Now that we’ve got the existence and uniqueness of our solutions down, we have one more of our promised results: the smooth dependence of solutions on initial conditions. That is, if we use our existence and uniqueness theorems to construct a unique “flow” function \psi:I\times U\to\mathbb{R}^n satisfying

\displaystyle\begin{aligned}\frac{\partial}{\partial t}\psi(t,u)&=F(\psi(t,u))\\\psi(0,u)=u\end{aligned}

by setting \psi(t,u)=v_u(t) — where v_u is the unique solution with initial condition v_u(0)=u — then \psi is continuously differentiable.

Now, we already know that \psi is continuously differentiable in the time direction by definition. What we need to show is that the directional derivatives involving directions in U exist and are continuous. To that end, let a\in U be a base point and h be a small enough displacement that a+h\in U as well. Similarly, let t_0 be a fixed point in time and let \Delta t be a small change in time

\displaystyle\begin{aligned}\lVert\psi(t_0+\Delta t,a+h)-\psi(t,a)\rVert=&\lVert v_{a+h}(t+\Delta t)-v_a(t)\rVert\\\leq&\lVert v_{a+h}(t+\Delta t)-v_a(t+\Delta t)\rVert\\&+\lVert v_a(t+\Delta t)-v_a(t)\rVert\end{aligned}

But now our result from last time tells us that these solutions can diverge no faster than exponentially. Thus we conclude that

\displaystyle\lVert v_{a+h}(t+\Delta t)-v_a(t+\Delta t)\rVert\leq\lVert h\rVert e^{K\Delta t}

and so as \lVert h\rVert\to0 this term must go to zero as well. Meanwhile, the second term also goes to zero by the differentiability of v_a. We can now see that the directional derivative at (t_0,a) in the direction of (\Delta t,h) exists.

But are these directional derivatives continuous. This turns out to be a lot more messy, but essentially doable by similar methods and a generalization of Gronwall’s inequality. For the sake of getting back to differential equations I’m going to just assert that not only do all directional derivatives exist, but they’re continuous, and thus the flow is C^1.

May 16, 2011 Posted by | Analysis, Differential Equations | Leave a comment

Control on the Divergence of Solutions

Now we can establish some control on how nearby solutions to the differential equation

\displaystyle v'(t)=F(v(t))

diverge. That is, as time goes by, how can the solutions move apart from each other?

Let x and y be two solutions satisfying initial conditions x(t_0)=x_0 and y(t_0)=y_0, respectively. The existence and uniqueness theorems we’ve just proven show that x and y are uniquely determined by this choice in some interval, and we’ll pick a t_1 so they’re both defined on the closed interval [t_0,t_1]. Now for every t in this interval we have

\displaystyle\lVert y(t)-x(t)\rVert\leq\lVert y_0-x_0\rVert e^{K(t-t_0)}

Where K is a Lipschitz constant for F in the region we’re concerned with. That is, the separation between the solutions x(t) and y(t) can increase no faster than exponentially.

So, let’s define d(t)=\lVert y(t)-x(t)\rVert to be this distance. Converting to integral equations, it’s clear that

\displaystyle y(t)-x(t)=y_0-x_0+\int\limits_{t_0}^t\left(F(y(s))-F(x(s))\right)\,ds

and thus

\displaystyle\begin{aligned}d(t)&\leq\lVert y(t_0)-x(t_0)\rVert+\int\limits_{t_0}^t\left\lVert F(y(s))-F(x(s))\right\rVert\,ds\\&\leq\lVert y(t_0)-x(t_0)\rVert+\int\limits_{t_0}^tK\lVert y(s)-x(s)\rVert\,ds\\&=d(t_0)+\int\limits_{t_0}^tKd(s)\,ds\end{aligned}

Now Gronwall’s inequality tells us that d(t)\leq d(t_0)e^{K(t-t_0)}, which is exactly the inequality we asserted above.

May 13, 2011 Posted by | Analysis, Differential Equations | 1 Comment

Gronwall’s Inequality

We’re going to need another analytic lemma, this one called “Gronwall’s inequality”. If v:[0,\alpha]\to\mathbb{R} is a continuous, nonnegative function, and if C and K are nonnegative constants such that

\displaystyle v(t)\leq C+\int\limits_0^tKv(s)\,ds

for all t\in[0,\alpha] then for all t in this interval we have

\displaystyle v(t)\leq Ce^{Kt}

That is, we can conclude that v grows no faster than an exponential function. Exponential growth may seem fast, but at least it doesn’t blow up to an infinite singularity in finite time, no matter what Kurzweil seems to think.

Anyway, first let’s deal with strictly positive C. If we define

\displaystyle V(t)=C+\int\limits_0^tKv(s)\,ds>0

then by assumption we have v(t)\leq V(t). Differentiating, we find V'(t)=Kv(t), and thus

\displaystyle\frac{d}{dt}\left(\log(V(t))\right)=\frac{V'(t)}{V(t)}=\frac{Kv(t)}{V(t)}\leq K

Integrating, we find


Finally we can exponentiate to find

\displaystyle v(t)\leq V(t)\leq Ce^{Kt}

proving Gronwall’s inequality.

If C=0, in our hypothesis, the hypothesis is true for any \bar{C}>0 in its place, and so we see that v(t)\leq\bar{C}e^{Kt} for any positive \bar{C}, which means that v(t) must be zero, as required by Gronwall’s inequality in this case.

May 11, 2011 Posted by | Analysis, Differential Equations | 3 Comments

Another Existence Proof

I’d like to go back and give a different proof that the Picard iteration converges — one which is closer to the spirit of Newton’s method. In that case, we proved that Newton’s method converged by showing that the derivative of the iterating function was less than one at the desired solution, making it an attracting fixed point.

In this case, however, we don’t have a derivative because our iteration runs over functions rather than numbers. We will replace it with a similar construction called the “functional derivative”, which is a fundamental part of the “calculus of variations”. I’m not really going to go too deep into this field right now, and I’m not going to prove the analogous result that a small functional derivative means an attracting fixed point, but it’s a nice exercise and introduction anyway.

So, we start with the Picard iteration again:

\displaystyle P[v](t)=a+\int\limits_0^tF(v(s))\,ds

We consider what happens when we add an adjustment to v:

\displaystyle\begin{aligned}P[v+h](t)&=a+\int\limits_0^tF(v(s)+h(s))\,ds\\&\approx a+\int\limits_0^tF(v(s))+dF(v(s))h(s)\,ds\\&=a+\int\limits_0^tF(v(s))\,ds+\int\limits_0^tdF(v(s))h(s)\,ds\\&=P[v](t)+\int\limits_0^tdF(v(s))h(s)\,ds\end{aligned}

We call the small change the “variation” of v, and we write \delta v=h. Similarly, we call the difference between P[v+\delta v] and P[v] the variation of P and write \delta P. It turns out that controlling the size of the variation \delta v gives us some control over the size of the variation \delta P. To wit, if \lVert\delta v\rVert_\infty\leq d then we find

\displaystyle\begin{aligned}\left\lVert\int\limits_0^tdF(v(s))\delta v(s)\,ds\right\rVert&\leq\int\limits_0^t\lVert dF(v(s))\delta v(s)\rVert\,ds\\&\leq\int\limits_0^t\lVert dF(v(s))\rVert_\text{op}\lVert\delta v(s)\rVert_\infty\,ds\\&\leq d\int\limits_0^t\lVert dF(v(s))\rVert_\text{op}\,ds\end{aligned}

Now our proof that F is locally Lipschitz involved showing that there’s a neighborhood of a where we can bound \lVert dF(x)\rVert_\text{op} by K. Again we can pick a small enough c so that \lvert s\rvert c implies that v(s) stays within this neighborhood, and also such that cK<1. And then we conclude that \lVert\delta P\rVert_\infty\leq d, which we can also write as

\displaystyle\frac{\delta P}{\delta v}<1

Now, admittedly this argument is a bit handwavy as it stands. Still, it does go to show the basic idea of the technique, and it’s a nice little introduction to the calculus of variations.

May 10, 2011 Posted by | Analysis, Calculus of Variations, Differential Equations | 1 Comment

Uniqueness of Solutions to Differential Equations

The convergence of the Picard iteration shows the existence part of our existence and uniqueness theorem. Now we prove the uniqueness part.

Let’s say that u(t) and v(t) are both solutions of the differential equation — u'(t)=F(u(t)) and v'(t)=F(v(t)) — and that they both satisfy the initial condition — u(0)=v(0)=a — on the same interval J=[-c,c] from the existence proof above. We will show that u(t)=v(t) for all t\in J by measuring the L^\infty norm of their difference:

\displaystyle Q=\lVert u-v\rVert_\infty=\max\limits_{t\in J}\lvert u(t)-v(t)\rvert

Since J is a closed interval, this maximum must be attained at a point t_1\in J. We can calculate

\displaystyle\begin{aligned}Q&=\lvert u(t_1)-v(t_1)\rvert\\&=\left\lvert\int\limits_0^{t_1}u'(s)-v'(s)\,ds\right\rvert\\&\leq\int\limits_0^{t_1}\lvert F(u(s))-F(v(s))\rvert\,ds\\&\leq\int\limits_0^{t_1}K\lvert u(s)-v(s)\rvert\,ds\\&\leq cKQ\end{aligned}

but by assumption we know that cK<1, which makes this inequality impossible unless Q=0. Thus the distance between u and v is 0, and the two functions must be equal on this interval, proving uniqueness.

May 9, 2011 Posted by | Analysis, Differential Equations | Leave a comment

The Picard Iteration Converges

Now that we’ve defined the Picard iteration, we have a sequence of functions v_i:J\to B_\rho from a closed neighborhood of 0\in\mathbb{R} to a closed neighborhood of a\in\mathbb{R}^n. Recall that we defined M to be an upper bound of \lVert F\rVert on B_\rho, K to be a Lipschitz constant for F on B_\rho, c less than both \frac{\rho}{M} and \frac{1}{K}, and J=[-c,c].

Specifically, we’ll show that the sequence converges in the supremum norm on J. That is, we’ll show that there is some v:J\to B_\rho so that the maximum of the difference \lVert v_k(t)-v(t)\rVert for t\in J decreases to zero as i increases. And we’ll do this by showing that the individual functions v_i and v_j get closer and closer in the supremum norm. Then they’ll form a Cauchy sequence, which we know must converge because the metric space defined by the supremum norm is complete, as are all the L^p spaces.

Anyway, let L=\lVert v_1-v_0\rVert_\infty be exactly the supremum norm of the difference between the first two functions in the sequence. I say that \lVert v_{i+1}-v_i\rVert_\infty\leq(cK)^iL. Indeed, we calculate inductively

\displaystyle\begin{aligned}\lVert v_{i+1}(t)-v_i(t)\rVert&\leq\int\limits_0^t\lVert F(v_i(s))-F(v_{i-1}(s))\rVert\,ds\\&\leq K\int\limits_0^t\lVert v_i(s)-v_{i-1}(s)\rVert\,ds\\&\leq K\int\limits_0^t(cK)^{i-1}L\,ds\\&\leq(cK)(cK)^{i-1}L\\&=(cK)^iL\end{aligned}

Now we can bound the distance between any two functions in the sequence. If i<j are two indices we calculate:

\displaystyle\begin{aligned}\lVert v_j-v_i\rVert_\infty&=\left\lVert\sum\limits_{k=i}^{j-1}v_{k+1}-v_k\right\rVert_\infty\\&\leq\sum\limits_{k=i}^{j-1}\lVert v_{k+1}-v_k\rVert_\infty\\&\leq\sum\limits_{k=i}^{j-1}(cK)^kL\end{aligned}

But this is a chunk of a geometric series; since cK<1, the series must converge, and so we can make this sum as small as we please by choosing i and j large enough.

This then tells us that our sequence of functions is L^\infty-Cauchy, and thus L^\infty-convergent, which implies uniform pointwise convergence. The uniformity is important because it means that we can exchange integration with the limiting process. That is,


And so we can start with our definition:

\displaystyle v_{k+1}(t)=a+\int\limits_0^tF(v_k(s))\,ds

and take the limit of both sides


where we have used the continuity of F. This shows that the limiting function v does indeed satisfy the integral equation, and thus the original initial value problem.

May 6, 2011 Posted by | Analysis, Differential Equations | 1 Comment

The Picard Iteration

Now we can start actually closing in on a solution to our initial value problem. Recall the setup:


The first thing we’ll do is translate this into an integral equation. Integrating both sides of the first equation and using the second equation we find

\displaystyle v(t)=a+\int\limits_0^tF(v(s))\,ds

Conversely, if v satisfies this equation then clearly it satisfies the two conditions in our initial value problem.

Now the nice thing about this formulation is that it expresses v as the fixed point of a certain operation. To find it, we will use an iterative method. We start with v_0(t)=a and define the “Picard iteration”

\displaystyle v_{i+1}(t)=a+\int\limits_0^tF(v_i(s))\,ds

This is sort of like Newton’s method, where we express the point we’re looking for as the fixed point of a function, and then find the fixed point by iterating that very function.

The one catch is, how are we sure that this is well-defined? What could go wrong? Well, how do we know that v_i(s) is in the domain of F? We have to make some choices to make sure this works out.

First, let B_\rho be the closed ball of radius \rho centered on a. We pick \rho so that F satisfies a Lipschitz condition on B_\rho, which we know we can do because F is locally Lipschitz. Since this is a closed ball and F is continuous, we can find an upper bound M\geq\lVert F(x)\rVert for v\in B_\rho. Finally, we can find a c<\min\left(\frac{\rho}{M},\frac{1}{K}\right), and the interval J=[-c,c]. I assert that v_k:J\to B_\rho is well-defined.

First of all, v_0(t)=a\in B_\rho for all t\in J, so that’s good. We now assume that v_i is well-defined and prove that v_{i+1} is as well. It’s clearly well-defined as a function, since v_i(t)\in B_\rho by assumption, and B_\rho is contained within the domain of F. The integral makes sense since the integrand is continuous, and then we can add a. But is v_{i+1}(t)\in B_\rho?

So we calculate

\displaystyle\begin{aligned}\left\lVert\int\limits_0^tF(v_i(s))\,ds\right\rVert&\leq\int\limits_0^t\lVert F(v_i(s))\rVert\,ds\\&\leq\int\limits_0^tM\,ds\\&\leq Mc\\&\leq\rho\end{aligned}

which shows that the difference between v_{i+1}(t) and a has length smaller than \rho for any t\in J. Thus v_{i+1}:J\to B_\rho, as asserted, and the Picard iteration is well-defined.

May 5, 2011 Posted by | Analysis, Differential Equations | 4 Comments

Continuously Differentiable Functions are Locally Lipschitz

It turns out that our existence proof will actually hinge on our function satisfying a Lipschitz condition. So let’s show that we will have this property anyway.

More specifically, we are given a C^1 function f:U\to\mathbb{R}^n defined on an open region U\subseteq\mathbb{R}^n. We want to show that around any point p\in U we have some neighborhood N where f satisfies a Lipschitz condition. That is: for x and y in the neighborhood N, there is a constant K and we have the inequality

\displaystyle\lVert f(y)-f(x)\rVert\leq K\lVert y-x\rVert

We don’t have to use the same K for each neighborhood, but every point should have a neighborhood with some K.

Infinitesimally, this is obvious. The differential df(p):\mathbb{R}^n\to\mathbb{R}^n is a linear transformation. Since it goes between finite-dimensional vector spaces it’s bounded, which means we have an inequality

\displaystyle\lVert df(p)v\rVert\leq\lVert df(p)\rVert_\text{op}\lVert v\rVert

where \lVert df(p)\rVert_\text{op} is the operator norm of df(p). What this lemma says is that if the function is C^1 we can make this work out over finite distances, not just for infinitesimal displacements.

So, given our point p let B_\epsilon be the closed ball of radius \epsilon around p, and choose \epsilon so small that B_\epsilon is contained within U. Since the function df(p) — which takes points to the space of linear operators — is continuous by our assumption, the function p\mapsto\lVert df(p)\rVert_\text{op} is continuous as well. The extreme value theorem tells us that since B_\epsilon is compact this continuous function must attain a maximum, which we call K.

The ball is also “convex”, meaning that given points x and y in the ball the whole segment x+t(y-x) for 0\leq t\leq1 is contained within the ball. We define a function g(t)=f(x+t(y-x)) and use the chain rule to calculate

\displaystyle g'(t)=df(x+t(y-x))\frac{d}{dt}(x+t(y-x))=df(x+t(y-x))(y-x)

Then we calculate


And from this we conclude

\displaystyle\begin{aligned}\lVert f(y)-f(x)\rVert&=\left\lVert\int\limits_0^1df(x+t(y-x))(y-x)\,dt\right\rVert\\&\leq\int\limits_0^1\lVert df(x+t(y-x))(y-x)\rVert\,dt\\&\leq\int\limits_0^1\lVert df(x+t(y-x))\rVert_\text{op}\lVert(y-x)\rVert\,dt\\&\leq\int\limits_0^1K\lVert(y-x)\rVert\,dt\\&=K\lVert(y-x)\rVert\end{aligned}

That is, the separation between the outputs is expressible as an integral, the integrand of which is bounded by our infinitesimal result above. Integrating up we get the bound we seek.

May 4, 2011 Posted by | Analysis, Differential Equations | 3 Comments

The Existence and Uniqueness Theorem of Ordinary Differential Equations (statement)

I have to take a little detour for now to prove an important result: the existence and uniqueness theorem of ordinary differential equations. This is one of those hard analytic nubs that differential geometry takes as a building block, but it still needs to be proven once before we can get back away from this analysis.

Anyway, we consider a continuously differentiable function F:U\to\mathbb{R}^n defined on an open region U\subseteq\mathbb{R}^n, and the initial value problem:


for some fixed initial value a\in U. I say that there is a unique solution to this problem, in the sense that there is some interval (-a,a) and a unique function v:(-a,a)\to\mathbb{R}^n satisfying both conditions.

In fact, more is true: the solution varies continuously with the starting point. That is, there is an interval I around 0\in\mathbb{R}, some neighborhood W of a and a continuously differentiable function \psi:I\times W\to U called the “flow” of the system defined by the differential equation v'=F(v), which satisfies the two conditions:

\displaystyle\begin{aligned}\frac{\partial}{\partial t}\psi(t,u)&=F(\psi(t,u))\\\psi(0,u)&=u\end{aligned}

Then for any w\in W we can get a curve v_w:I\to U defined by v_w(t)=\psi(t,w). The two conditions on the flow then tell us that v_w is a solution of the initial value problem with initial value w.

This will take us a short while, but then we can put it behind us and get back to differential geometry. Incidentally, the approach I will use generally follows that of Hirsch and Smale.

May 4, 2011 Posted by | Analysis, Differential Equations | 4 Comments