## Gronwall’s Inequality

We’re going to need another analytic lemma, this one called “Gronwall’s inequality”. If is a continuous, nonnegative function, and if and are nonnegative constants such that

for all then for all in this interval we have

That is, we can conclude that grows no faster than an exponential function. Exponential growth may seem fast, but at least it doesn’t blow up to an infinite singularity in finite time, no matter what Kurzweil seems to think.

Anyway, first let’s deal with strictly positive . If we define

then by assumption we have . Differentiating, we find , and thus

Integrating, we find

Finally we can exponentiate to find

proving Gronwall’s inequality.

If , in our hypothesis, the hypothesis is true for any in its place, and so we see that for any positive , which means that must be zero, as required by Gronwall’s inequality in this case.

## Another Existence Proof

I’d like to go back and give a different proof that the Picard iteration converges — one which is closer to the spirit of Newton’s method. In that case, we proved that Newton’s method converged by showing that the derivative of the iterating function was less than one at the desired solution, making it an attracting fixed point.

In this case, however, we don’t have a derivative because our iteration runs over functions rather than numbers. We will replace it with a similar construction called the “functional derivative”, which is a fundamental part of the “calculus of variations”. I’m not really going to go too deep into this field right now, and I’m not going to prove the analogous result that a small functional derivative means an attracting fixed point, but it’s a nice exercise and introduction anyway.

So, we start with the Picard iteration again:

We consider what happens when we add an adjustment to :

We call the small change the “variation” of , and we write . Similarly, we call the difference between and the variation of and write . It turns out that controlling the size of the variation gives us some control over the size of the variation . To wit, if then we find

Now our proof that is locally Lipschitz involved showing that there’s a neighborhood of where we can bound by . Again we can pick a small enough so that implies that stays within this neighborhood, and also such that . And then we conclude that , which we can also write as

Now, admittedly this argument is a bit handwavy as it stands. Still, it does go to show the basic idea of the technique, and it’s a nice little introduction to the calculus of variations.

## Uniqueness of Solutions to Differential Equations

The convergence of the Picard iteration shows the existence part of our existence and uniqueness theorem. Now we prove the uniqueness part.

Let’s say that and are both solutions of the differential equation — and — and that they both satisfy the initial condition — — on the same interval from the existence proof above. We will show that for all by measuring the norm of their difference:

Since is a closed interval, this maximum must be attained at a point . We can calculate

but by assumption we know that , which makes this inequality impossible unless . Thus the distance between and is , and the two functions must be equal on this interval, proving uniqueness.

## The Picard Iteration Converges

Now that we’ve defined the Picard iteration, we have a sequence of functions from a closed neighborhood of to a closed neighborhood of . Recall that we defined to be an upper bound of on , to be a Lipschitz constant for on , less than both and , and .

Specifically, we’ll show that the sequence converges in the supremum norm on . That is, we’ll show that there is some so that the maximum of the difference for decreases to zero as increases. And we’ll do this by showing that the individual functions and get closer and closer in the supremum norm. Then they’ll form a Cauchy sequence, which we know must converge because the metric space defined by the supremum norm is complete, as are all the spaces.

Anyway, let be exactly the supremum norm of the difference between the first two functions in the sequence. I say that . Indeed, we calculate inductively

Now we can bound the distance between any two functions in the sequence. If are two indices we calculate:

But this is a chunk of a geometric series; since , the series must converge, and so we can make this sum as small as we please by choosing and large enough.

This then tells us that our sequence of functions is -Cauchy, and thus -convergent, which implies uniform pointwise convergence. The uniformity is important because it means that we can exchange integration with the limiting process. That is,

And so we can start with our definition:

and take the limit of both sides

where we have used the continuity of . This shows that the limiting function does indeed satisfy the integral equation, and thus the original initial value problem.

## The Picard Iteration

Now we can start actually closing in on a solution to our initial value problem. Recall the setup:

The first thing we’ll do is translate this into an integral equation. Integrating both sides of the first equation and using the second equation we find

Conversely, if satisfies this equation then clearly it satisfies the two conditions in our initial value problem.

Now the nice thing about this formulation is that it expresses as the fixed point of a certain operation. To find it, we will use an iterative method. We start with and define the “Picard iteration”

This is sort of like Newton’s method, where we express the point we’re looking for as the fixed point of a function, and then find the fixed point by iterating that very function.

The one catch is, how are we sure that this is well-defined? What could go wrong? Well, how do we know that is in the domain of ? We have to make some choices to make sure this works out.

First, let be the closed ball of radius centered on . We pick so that satisfies a Lipschitz condition on , which we know we can do because is locally Lipschitz. Since this is a closed ball and is continuous, we can find an upper bound for . Finally, we can find a , and the interval . I assert that is well-defined.

First of all, for all , so that’s good. We now assume that is well-defined and prove that is as well. It’s clearly well-defined as a function, since by assumption, and is contained within the domain of . The integral makes sense since the integrand is continuous, and then we can add . But is ?

So we calculate

which shows that the difference between and has length smaller than for any . Thus , as asserted, and the Picard iteration is well-defined.

## Continuously Differentiable Functions are Locally Lipschitz

It turns out that our existence proof will actually hinge on our function satisfying a Lipschitz condition. So let’s show that we will have this property anyway.

More specifically, we are given a function defined on an open region . We want to show that around any point we have some neighborhood where satisfies a Lipschitz condition. That is: for and in the neighborhood , there is a constant and we have the inequality

We don’t have to use the same for each neighborhood, but every point should have a neighborhood with some .

Infinitesimally, this is obvious. The differential is a linear transformation. Since it goes between finite-dimensional vector spaces it’s bounded, which means we have an inequality

where is the operator norm of . What this lemma says is that if the function is we can make this work out over finite distances, not just for infinitesimal displacements.

So, given our point let be the closed ball of radius around , and choose so small that is contained within . Since the function — which takes points to the space of linear operators — is continuous by our assumption, the function is continuous as well. The extreme value theorem tells us that since is compact this continuous function must attain a maximum, which we call .

The ball is also “convex”, meaning that given points and in the ball the whole segment for is contained within the ball. We define a function and use the chain rule to calculate

Then we calculate

And from this we conclude

That is, the separation between the outputs is expressible as an integral, the integrand of which is bounded by our infinitesimal result above. Integrating up we get the bound we seek.

## The Existence and Uniqueness Theorem of Ordinary Differential Equations (statement)

I have to take a little detour for now to prove an important result: the existence and uniqueness theorem of ordinary differential equations. This is one of those hard analytic nubs that differential geometry takes as a building block, but it still needs to be proven once before we can get back away from this analysis.

Anyway, we consider a continuously differentiable function defined on an open region , and the initial value problem:

for some fixed initial value . I say that there is a unique solution to this problem, in the sense that there is some interval and a unique function satisfying both conditions.

In fact, more is true: the solution varies continuously with the starting point. That is, there is an interval around , some neighborhood of and a continuously differentiable function called the “flow” of the system defined by the differential equation , which satisfies the two conditions:

Then for any we can get a curve defined by . The two conditions on the flow then tell us that is a solution of the initial value problem with initial value .

This will take us a short while, but then we can put it behind us and get back to differential geometry. Incidentally, the approach I will use generally follows that of Hirsch and Smale.

## Submersions

Another quick definition: we say that a smooth map is a “submersion” if it is surjective, and if every point is a regular point of . Despite the similarity of the terms “immersion” and “submersion”, these are very different concepts, so be careful to keep them separate.

The nice thing about submersions is that every value of is a regular value, and every one has a nonempty preimage. Thus our extension of the implicit function theorem applies to show that is an -dimensional submanifold of .

One obvious example of submersion is a projection from a product manifold. As we’ve seen, the determinant of this projection is always a surjection. In fact, it’s a projection itself.

Another example is the projection from the tangent bundle down to its base manifold . Indeed, given any tangent vector at we can pick a coordinate patch around and the corresponding patch of . Within these coordinates we can easily calculate the derivative of and see that it’s just a projection onto the first components, which is surjective. In this case, the preimages are the stalks of the tangent bundle .