## Some theorems about metric spaces

We need to get down a few facts about metric spaces before we can continue on our course. Firstly, as I alluded in an earlier comment, compact metric spaces are sequentially compact — every sequence has a convergent subsequence.

To see this fact, we’ll use the fact that compact spaces are the next best thing to finite. Specifically, in a finite set any infinite sequence would have to hit one point infinitely often. Here instead, we’ll have an accumulation point in our compact metric space so that for any and point in our sequence there is some with . That is, though the sequence may move away from , it always comes back within of it again. Once we have an accumulation point , we can find a subsequence converging to just as we found a subnet converging to any accumulation point of a net.

Let’s take our sequence and define — the closure of the sequence from onwards. Then these closed sets are nested , and the intersection of any finite number of them is the smallest one, which is clearly nonempty since it contains a tail of the sequence. Then by the compactness of we see that the intersection of *all* the is again nonempty. Since the points in this intersection are in the closure of any tail of the sequence, they must be accumulation points.

Okay, that doesn’t *quite* work. See the comments for more details. Michael asks where I use the fact that we’re in a metric space, which was very astute. It turns out on reflection that I *did* use it, but it was hidden.

We can still say we’re looking for an accumulation point first and foremost, because if the sequence has an accumulation point there must be some subsequence converging to that point. Why not a subnet in general? Because metric spaces must be normal Hausdorff (using metric neighborhoods to separate

closed sets) and first-countable! And as long as we’re first-countable (or, weaker, “sequential”) we can find a sequence converging to any limit point of a net.

What I didn’t say before is that once we find an accumulation point there will be a subsequence converging to that point. My counterexample is compact, and any sequence in it has accumulation points, but we will only be able to find sub*nets* of our sequence converging to them, not sub*sequences*. Unless we add something to assure that our space is sequential, and metric spaces do that.

We should note in passing that the special case where is a compact subspace of is referred to as the Bolzano-Weierstrass Theorem.

Next is the Heine-Cantor theorem, which says that any continuous function from a compact metric space to any metric space is uniformly continuous. In particular, we can use the interval as our compact metric space and the real numbers as our metric space to see that any continuous function on a closed interval is uniformly continuous.

So let’s assume that is continuous but not uniformly continuous. Then there is some so that for any there are points and in with but . In particular, we can pick as our and get two sequences and with but . By the above theorem we can find subsequences converging to and converging to .

Now , which converges to , and so . Therefore we must have also converging to by the continuity of . But this can’t happen, since each of these distances must be at least ! Thus must have been uniformly continuous to begin with.

## Darboux Integration

Okay, defining the integral as the limit of a net of Riemann sums is all well and good, but it’s a *huge* net, and it seems impossible to calculate with. We need a better way of getting a handle on these things. What we’ll use is a little trick for evaluating limits of nets that I haven’t mentioned yet: “cofinal sets”.

Given a directed set , a directed subset is cofinal if for every there is some with . Now watch what happens when we try to show that the limit of a net is a point . We need to find for every neighborhood of an index so that for every we have . But if is such an index, then there is some above it, and every above *that* is also above , and so . That is, if the limit over exists, then the limit over exists and has the same value.

Let’s give a cofinal set of tagged partitions by giving a rule for picking the tags that go with any partition. Then our net consists just of partitions of the interval , and the tags come for free. If the function is Riemann-integrable, then the limit over this cofinal set will be the integral. Here’s our rule: in the closed subinterval pick a point so that is the supremum of the values of in that subinterval. If the function is continuous it will attain a maximum at our tag, and if not it’ll get close or shoot off to infinity (if there is no supremum).

Why is this cofinal? Let’s imagine a tagged partition where is *not* chosen according to this rule. Then we can refine the partition by splitting up the th strip in such a way that is the maximum in one of the new strips, and choosing all the new tags according to the rule. Then we’ve found a good partition above the one we started with. Similarly, we can build another cofinal set by always choosing the tags where approaches an *infimum*.

When we consider a partition in the first cofinal set we can set up something closely related to the Riemann sums: the “upper Darboux sums”

where is the supremum of on the interval , or infinity if the value of is unbounded above here. Similarly, we can define the “lower Darboux sum”

where now is the infimum (or negative infinity). If the function is Riemann-integrable, then the limits over these cofinal sets both exist and are both equal to the Riemann integral. So we define a function to be “Darboux-integrable” if the limits of the upper and lower Darboux sums both exist and have the same value. Then the Darboux integral is defined to be this common value. Notice that if the function ever shoots off to positive or negative infinity we’ll get an infinite value for one of the terms, and we can never converge, so such functions are not Darboux-integrable.

We should notice here that given any partition , the upper Darboux sum must be larger than any Riemann sum with that same partition, since no matter how we choose the tag we’ll find that by definition. Similarly, the lower Darboux sum must be smaller than any Riemann sum on the same partition. Now let’s say that the upper and lower Darboux sums both converge to the same value . Then given any neighborhood of we can find a partition so that every upper Darboux sum over a refinement of is in the neighborhood, and a similar partition for the lower Darboux sums. Choosing a common refinement of both (which we can do because partitions form a *directed* set) both its upper and lower Darboux sums (and those of any of *its* refinements) will be in our neighborhood. Then we can choose any tags in we want, and the Riemann sum will again be in the neighborhood. Thus a Darboux-integrable function is also Riemann-integrable.

So this new notion of Darboux-integrability is really the same one as Riemann-integrability, but it involves taking two limits over a much less complicated directed set. For now, we’ll just call a function which satisfies either of these two equivalent conditions “integrable” and be done with it, using whichever construction of the integral is most appropriate to our needs at the time.

## Riemann Integration

Before continuing with methods of antidifferentiation, let’s consider another geometric problem: integration. Here’s an example:

We’ve got a function whose graph is drawn in red, and we want to find the area contained between the graph, the -axis, and the two blue lines at and . We’ll approximate this by cutting up this interval into pieces and choosing a sample point in each piece, like so:

Now we’ve just got a bunch of rectangles, and we can add up their areas to get

where is the value of the function at the th sample point, and is the width of the th strip. Now as we cut the strips thinner and thinner, our stairstep-like approximation to the function should get closer and closer to the real function, and our approximation to the area we’re interested in should get better and better.

So how can we formalize this process? First, let’s take an interval and think about how to cut it up the strips. We do this by picking a collection of points . We get a bunch of smaller intervals , and in each one we pick some . This structure we call a “tagged partition” of the interval . We define the “mesh” of a partition to be its thickest subinterval, , and we’ll want to somehow take this down to zero.

We can now see that the collection of all the tagged partitions of an interval form a directed set! We say that a tagged partition is a “refinement” of a tagged partition if every partition point is one of the , and every tag is one of the . That is, we get from to by splitting up some of the slices of and adding new tags to the new slices. Then we define if is a refinement of . This makes the collection of tagged partitions into a partially-ordered set.

To show that this is a directed set, consider any two tagged partitions and , and make a new partition by using all the partition points from each one. Now look at each slice in the new partition. It can’t have more than one tag or tag, so it has either zero, one, or two distinct tags. If it has no tags, add one. If it has one tag, do nothing. If it has two distinct tags, split it between them (notice how we’re using the topology of to say we *can* make this split). At the end, we’ve got a new partition that refines *both* of and . And thus we have a directed set.

Now if we have a function on , we can get a net on this directed set. Given any tagged partition , we define the “Riemann sum”

Finally, we say that the function is “Riemann integrable” if this net converges to a limit , and in this case we define the “Riemann integral” of :

which is, at last, the area under the curve as we set out to find.

## Antiderivatives

One of the consequences of the mean value theorem we worked out was that two differentiable functions and on an interval differ by a constant if and only if their derivatives are the same: for all . Now let’s turn this around the other way.

We start with a function on an interval and define an “antiderivative” of to be a function on the same interval such that for . What the above conclusion from the mean value theorem shows us is that there’s only one way any two solutions could differ. That is if is some particular antiderivative of then any other antiderivative satisfies for some real constant . So the hard bit about antiderivatives is all in finding a particular one, since the general solution to an antidifferentiation problem just involves adding an arbitrary constant corresponding to the constant we lose when we differentiate.

Some antiderivatives we can pull out right away. We know that if then . Thus, turning this around, we find an antiderivative of , except if , because then we’ll have to divide by zero. We’ll figure out what to do with this exception later.

We can also turn around some differentiation rules. For instance, since then if is an antiderivative of a function and an antiderivative of then is an antiderivative of . Similarly, the differentiation rule for a constant multiple tells us that is an antiderivative of for any real constant .

Between these we can handle antidifferentiation of any polynomial . Each term of the polynomial is some constant times a power of , so the constant multiple rule and the rule for powers of gives us an antiderivative for each term. Then we can just add these antiderivatives all together. We also only have one arbitrary constant to add since we can just add together the constants for each term to get one overall constant for the whole polynomial.

## Linking Integrals

There’s a new paper out on the arXiv discussing higher-dimensional linking integrals, by two graduate students at the University of Pennsylvania. I don’t have time to really go through it right now, but at a first scan I’m really not sure what they’ve done here. It seems they’re just taking the regular Gauss integral and doing the exact same thing for higher-dimensional spheres, although in a way that’s so loaded down with notation that it obscures the fact that it’s the exact same idea.

Some people like results that are more computationally focused, and some (like me) prefer to lay bare the structure of the concepts, and derive a computational framework later. It may be that these authors are just more the former than the latter. Anyhow, I’m not certain how original it is, but my own work is shot through with “wait, you mean nobody’s written that up yet?” If they’ve found one of these obvious niches that nobody has gotten around to mining, more power to them.

## Distinguishing Maxima and Minima

From Heine-Borel we know that a continuous function on a closed interval takes a global maximum and a minimum. From Fermat we know that any local (and in particular any global) extremum occurs at a critical points — a point where , or has no derivative at all. But once we find these critical points how can we tell maxima from minima?

The biggest value of at a critical point is clearly the global maximum, and the smallest is just as clearly the minimum. But what about all the ones in between? Here’s where those consequences of the mean value theorem come in handy. For simplicity, let’s assume that the critical points are isolated. That is, each one has a neighborhood in which it’s the only critical point. Further, let’s assume that is continuous wherever it exists.

Now, to the left of any critical point we’ll have a stretch where is differentiable (or else there would be another critical point there) and is nonzero (ditto). Since the derivative is continuous, it must either be always positive or always negative on this stretch, because if it was sometimes positive and sometimes negative the intermediate value theorem would give us a point where it’s zero. If the derivative is positive, our corollaries of the mean value theorem tell us that increases as we move in towards the point, while if the derivative is negative it decreases into the critical point. Similarly, on the right we’ll have another such stretch telling us that either increases or decreases as we move away from the critical point.

So what’s a local maximum? It’s a critical point where the function increases moving into the critical point and decreases moving away! That is, if near the critical point the derivative is positive on the left and negative on the right, we’ve got ourselves a local maximum. If the derivative is positive on the right and negative on the left, it’s a local minimum. And if we find the same sign on either side, it’s neither! Notice that this is exactly what happens with the function at its critical point. Also, we don’t have to worry about where to test the sight of the derivative, because we know that it can only change signs at a critical point.

In fact, if we add a bit more to our assumptions we can get an even nicer test. Let’s assume that the function is “twice-differentiable” — that is *itself* a differentiable function — on our interval. Then all the critical points happen where . Even better now, if it changes signs as we pass through the critical point (indicating a local extremum) it’s either increasing or decreasing, and this will be reflected in *its* derivative at the critical point. If then our sign changes from negative to positive and we must be looking at a local minimum. On the other hand, if then we’ve got a local maximum. Unfortunately, if we don’t really get any information from this test and we have to fall back on the previous one.

## Consequences of the Mean Value Theorem

So now that we have the mean value theorem what can we do with it? First off, we can tell something that seems intuitively obvious. We know that a constant function has the constant zero function as its derivative. It turns out that these are the *only* functions with zero derivative.

To see this, let be a differentiable function on so that for all . Let and be any points between and with . Then restricts to a continuous function on the interval which is differentiable on the interior . The differentiable mean value theorem then applies, and it tells us that there is some with . But by assumption this derivative is zero, and so . Since the points were arbitrary, takes the same value at each point in .

What about a function for which on some interval ? Looking at the graph it seems that the slope of all the tangent lines should be positive, and so the function should be increasing. Indeed this is the case.

Specifically we have to show that if for two points in then . Again we look at the restriction of to a continuous function on which is differentiable on . Then the mean value theorem tells us that there is some with . By assumption this quantity is positive, as is , and so . Similarly we can show that if on an interval then the function is decreasing there.

## A note on the Periodic Functions Problem

Over at The Everything Seminar, Jim Belk mentions an interesting little problem.

Show that there exist two periodic functions whose sum is the identity function:

for all

He notes right off that, “Obviously the functions and can’t be continuous, since any continuous periodic function is bounded.” I’d like to explain why, in case you didn’t follow that.

If a function is periodic, that means it factors through a map to the circle, which we call . Why? Because “periodic” with period means we can take the interval and glue one end to the other to make a circle. As we walk along the real line we walk around the circle. When we come to the end of a period in the line, that’s like getting back to where we started on the circle. Really what we’re doing is specifying a function on the circle and then using that function over and over again to give us a function on the real line. And if is going to be continuous, the function had better be as well.

Now, I assert that the circle is compact. I could do a messy proof inside the circle itself (and I probably should in the long run) but for now we can just see the circle lying in the plane as the collection of points distance from the origin. Then this subspace of the plane is clearly bounded, and it’s not hard to show that it’s closed. The Heine-Borel theorem tells us that it’s compact!

And now since the circle is compact we know that its image under the continuous map must be compact as well! And since the image of is the same as the image of , it must also be a compact subspace of — a closed, *bounded* interval. Neat.

## The Differential Mean Value Theorem

Let’s say we’ve got a function that’s continuous on the closed interval and differentiable on . We don’t even assume the function is defined outside the interval, so we can’t really set up the limit for differentiability at the endpoints, but they don’t matter much in the end.

Anyhow, if we look at the graph of we could just draw a straight line from the point to the point . The graph itself wanders away from this line and back, but the line tells us that on average we’re moving from to at a certain rate — the slope of the line. Since this is an *average* behavior, sometimes we must be going faster and sometimes slower. The differential mean value theorem says that there’s at least one point where we’re going *exactly* that fast. Geometrically, this means that the tangent line will be parallel to the secant we drew between the endpoints. In formulas we say there is a point with .

First let’s nail down a special case, called “Rolle’s theorem”. If , we’re asserting that there is some point with . Since is compact and is continuous, the extreme value theorem tells us that must take a maximum and a minimum. If these are both zero, then we’re looking at the constant function , and any point in the middle satisfies . On the other hand, if either the maximum or minimum is nonzero, then we have a local extremum at a point where is differentiable (since it’s differentiable all through the open interval). Now Fermat’s theorem tells us that since is a local extremum! Thus Rolle’s theorem is proved.

Now for the general case. Start with the function and build from it the function . On the graph, this corresponds to applying an “affine transformation” (which sends straight lines in the plane to other straight lines in the plane) to pull both and down to zero. In fact, it’s a straightforward calculation to see that . Thus Rolle’s theorem applies and we find a point with . But applying our laws of differentiation, we see that . And so , as desired.

## Fermat’s Theorem

Okay, the Heine-Borel theorem tells us that a continuous real-valued function on a compact space takes a maximum and a minimum value. In particular, this holds for functions on closed intervals. But how can we recognize a maximum or a minimum when we see one?

First of all, what we get from the Heine-Borel theorem is a *global* maximum and minimum. That is, a point so that for any we have (or ). We also can consider “local” maxima and minima. As you might guess from local connectedness and local compactness, a local maximum (minimum) is a global maximum (minimum) in some neighborhood . For example, if is a function on some region in then having a local maximum at means that there is some interval with , and for every we have .

So a function may have a number of local maxima and minima, but they’re not all global. Still, finding local maxima and minima is an important first step. In practice there’s only a finite number of them, and we can easily pick out which of them are global by just computing the function. So what do they look like?

For functions on regions in , the biggest part of the answer comes from Fermat’s theorem. The theorem itself actually talks about differentiable functions, so the first thing we’ll say is that an extremum may occur at a point where the function is not differentiable (though a point of nondifferentiability is not a sure sign of being an extremum).

Now, let’s say that we have a local maximum at and that is differentiable at . We can set up the difference quotient . When we take our limit as goes to , we can restrict to the neighborhood where gives a global maximum, so . To the right of , , so the difference quotient is negative here. To the left of , , so the difference quotient is positive here. Then since the limit must be a limit point of both of these regions, it must be . That is, . And the same thing happens for local minima.

So let’s define a “critical point” of a function to be one where either isn’t differentiable or . Then any local extremum must happen at a critical point. But not every critical point is a local extremum. The easiest example is , which has derivative . Then the only critical point is , for which , but any neighborhood of has both positive and negative values of , so it’s not a local maximum or minimum.

Geometrically, we should have expected as much as this. Remember that the derivative is the slope of the tangent line. At a local maximum, the function rises to the crest and falls again, and at the top the tangent line balances perfectly level with zero slope. We can see this when we draw the graph, and it provides the intuition behind Fermat’s theorem, but to speak with certainly we need the analytic definitions and the proof of the theorem.