Some theorems about metric spaces
We need to get down a few facts about metric spaces before we can continue on our course. Firstly, as I alluded in an earlier comment, compact metric spaces are sequentially compact — every sequence has a convergent subsequence.
To see this fact, we’ll use the fact that compact spaces are the next best thing to finite. Specifically, in a finite set any infinite sequence would have to hit one point infinitely often. Here instead, we’ll have an accumulation point in our compact metric space
so that for any
and point
in our sequence there is some
with
. That is, though the sequence may move away from
, it always comes back within
of it again. Once we have an accumulation point
, we can find a subsequence converging to
just as we found a subnet converging to any accumulation point of a net.
Let’s take our sequence and define — the closure of the sequence from
onwards. Then these closed sets are nested
, and the intersection of any finite number of them is the smallest one, which is clearly nonempty since it contains a tail of the sequence. Then by the compactness of
we see that the intersection of all the
is again nonempty. Since the points in this intersection are in the closure of any tail of the sequence, they must be accumulation points.
Okay, that doesn’t quite work. See the comments for more details. Michael asks where I use the fact that we’re in a metric space, which was very astute. It turns out on reflection that I did use it, but it was hidden.
We can still say we’re looking for an accumulation point first and foremost, because if the sequence has an accumulation point there must be some subsequence converging to that point. Why not a subnet in general? Because metric spaces must be normal Hausdorff (using metric neighborhoods to separate
closed sets) and first-countable! And as long as we’re first-countable (or, weaker, “sequential”) we can find a sequence converging to any limit point of a net.
What I didn’t say before is that once we find an accumulation point there will be a subsequence converging to that point. My counterexample is compact, and any sequence in it has accumulation points, but we will only be able to find subnets of our sequence converging to them, not subsequences. Unless we add something to assure that our space is sequential, and metric spaces do that.
We should note in passing that the special case where is a compact subspace of
is referred to as the Bolzano-Weierstrass Theorem.
Next is the Heine-Cantor theorem, which says that any continuous function from a compact metric space
to any metric space
is uniformly continuous. In particular, we can use the interval
as our compact metric space
and the real numbers
as our metric space
to see that any continuous function on a closed interval is uniformly continuous.
So let’s assume that is continuous but not uniformly continuous. Then there is some
so that for any
there are points
and
in
with
but
. In particular, we can pick
as our
and get two sequences
and
with
but
. By the above theorem we can find subsequences
converging to
and
converging to
.
Now , which converges to
, and so
. Therefore we must have
also converging to
by the continuity of
. But this can’t happen, since each of these distances must be at least
! Thus
must have been uniformly continuous to begin with.
Darboux Integration
Okay, defining the integral as the limit of a net of Riemann sums is all well and good, but it’s a huge net, and it seems impossible to calculate with. We need a better way of getting a handle on these things. What we’ll use is a little trick for evaluating limits of nets that I haven’t mentioned yet: “cofinal sets”.
Given a directed set , a directed subset
is cofinal if for every
there is some
with
. Now watch what happens when we try to show that the limit of a net
is a point
. We need to find for every neighborhood
of
an index
so that for every
we have
. But if
is such an index, then there is some
above it, and every
above that is also above
, and so
. That is, if the limit over
exists, then the limit over
exists and has the same value.
Let’s give a cofinal set of tagged partitions by giving a rule for picking the tags that go with any partition. Then our net consists just of partitions of the interval , and the tags come for free. If the function
is Riemann-integrable, then the limit over this cofinal set will be the integral. Here’s our rule: in the closed subinterval
pick a point
so that
is the supremum of the values of
in that subinterval. If the function is continuous it will attain a maximum at our tag, and if not it’ll get close or shoot off to infinity (if there is no supremum).
Why is this cofinal? Let’s imagine a tagged partition where
is not chosen according to this rule. Then we can refine the partition by splitting up the
th strip in such a way that
is the maximum in one of the new strips, and choosing all the new tags according to the rule. Then we’ve found a good partition above the one we started with. Similarly, we can build another cofinal set by always choosing the tags where
approaches an infimum.
When we consider a partition in the first cofinal set we can set up something closely related to the Riemann sums: the “upper Darboux sums”
where is the supremum of
on the interval
, or infinity if the value of
is unbounded above here. Similarly, we can define the “lower Darboux sum”
where now is the infimum (or negative infinity). If the function is Riemann-integrable, then the limits over these cofinal sets both exist and are both equal to the Riemann integral. So we define a function to be “Darboux-integrable” if the limits of the upper and lower Darboux sums both exist and have the same value. Then the Darboux integral is defined to be this common value. Notice that if the function ever shoots off to positive or negative infinity we’ll get an infinite value for one of the terms, and we can never converge, so such functions are not Darboux-integrable.
We should notice here that given any partition , the upper Darboux sum must be larger than any Riemann sum with that same partition, since no matter how we choose the tag
we’ll find that
by definition. Similarly, the lower Darboux sum must be smaller than any Riemann sum on the same partition. Now let’s say that the upper and lower Darboux sums both converge to the same value
. Then given any neighborhood of
we can find a partition
so that every upper Darboux sum over a refinement of
is in the neighborhood, and a similar partition
for the lower Darboux sums. Choosing a common refinement
of both (which we can do because partitions form a directed set) both its upper and lower Darboux sums (and those of any of its refinements) will be in our neighborhood. Then we can choose any tags in
we want, and the Riemann sum will again be in the neighborhood. Thus a Darboux-integrable function is also Riemann-integrable.
So this new notion of Darboux-integrability is really the same one as Riemann-integrability, but it involves taking two limits over a much less complicated directed set. For now, we’ll just call a function which satisfies either of these two equivalent conditions “integrable” and be done with it, using whichever construction of the integral is most appropriate to our needs at the time.
Riemann Integration
Before continuing with methods of antidifferentiation, let’s consider another geometric problem: integration. Here’s an example:
We’ve got a function whose graph is drawn in red, and we want to find the area contained between the graph, the -axis, and the two blue lines at
and
. We’ll approximate this by cutting up this interval into
pieces and choosing a sample point
in each piece, like so:
Now we’ve just got a bunch of rectangles, and we can add up their areas to get
where is the value of the function at the
th sample point, and
is the width of the
th strip. Now as we cut the strips thinner and thinner, our stairstep-like approximation to the function should get closer and closer to the real function, and our approximation to the area we’re interested in should get better and better.
So how can we formalize this process? First, let’s take an interval and think about how to cut it up the strips. We do this by picking a collection of points
. We get a bunch of smaller intervals
, and in each one we pick some
. This structure we call a “tagged partition” of the interval
. We define the “mesh” of a partition to be its thickest subinterval,
, and we’ll want to somehow take this down to zero.
We can now see that the collection of all the tagged partitions of an interval form a directed set! We say that a tagged partition is a “refinement” of a tagged partition
if every partition point
is one of the
, and every tag
is one of the
. That is, we get from
to
by splitting up some of the slices of
and adding new tags to the new slices. Then we define
if
is a refinement of
. This makes the collection of tagged partitions into a partially-ordered set.
To show that this is a directed set, consider any two tagged partitions and
, and make a new partition by using all the partition points from each one. Now look at each slice in the new partition. It can’t have more than one
tag or
tag, so it has either zero, one, or two distinct tags. If it has no tags, add one. If it has one tag, do nothing. If it has two distinct tags, split it between them (notice how we’re using the topology of
to say we can make this split). At the end, we’ve got a new partition that refines both of
and
. And thus we have a directed set.
Now if we have a function on
, we can get a net on this directed set. Given any tagged partition
, we define the “Riemann sum”
Finally, we say that the function is “Riemann integrable” if this net converges to a limit
, and in this case we define the “Riemann integral” of
:
which is, at last, the area under the curve as we set out to find.
Antiderivatives
One of the consequences of the mean value theorem we worked out was that two differentiable functions and
on an interval
differ by a constant if and only if their derivatives are the same:
for all
. Now let’s turn this around the other way.
We start with a function on an interval
and define an “antiderivative” of
to be a function
on the same interval such that
for
. What the above conclusion from the mean value theorem shows us is that there’s only one way any two solutions could differ. That is if
is some particular antiderivative of
then any other antiderivative
satisfies
for some real constant
. So the hard bit about antiderivatives is all in finding a particular one, since the general solution to an antidifferentiation problem just involves adding an arbitrary constant corresponding to the constant we lose when we differentiate.
Some antiderivatives we can pull out right away. We know that if then
. Thus, turning this around, we find an antiderivative of
, except if
, because then we’ll have to divide by zero. We’ll figure out what to do with this exception later.
We can also turn around some differentiation rules. For instance, since then if
is an antiderivative of a function
and
an antiderivative of
then
is an antiderivative of
. Similarly, the differentiation rule for a constant multiple tells us that
is an antiderivative of
for any real constant
.
Between these we can handle antidifferentiation of any polynomial . Each term of the polynomial is some constant times a power of
, so the constant multiple rule and the rule for powers of
gives us an antiderivative for each term. Then we can just add these antiderivatives all together. We also only have one arbitrary constant to add since we can just add together the constants for each term to get one overall constant for the whole polynomial.
Linking Integrals
There’s a new paper out on the arXiv discussing higher-dimensional linking integrals, by two graduate students at the University of Pennsylvania. I don’t have time to really go through it right now, but at a first scan I’m really not sure what they’ve done here. It seems they’re just taking the regular Gauss integral and doing the exact same thing for higher-dimensional spheres, although in a way that’s so loaded down with notation that it obscures the fact that it’s the exact same idea.
Some people like results that are more computationally focused, and some (like me) prefer to lay bare the structure of the concepts, and derive a computational framework later. It may be that these authors are just more the former than the latter. Anyhow, I’m not certain how original it is, but my own work is shot through with “wait, you mean nobody’s written that up yet?” If they’ve found one of these obvious niches that nobody has gotten around to mining, more power to them.
Distinguishing Maxima and Minima
From Heine-Borel we know that a continuous function on a closed interval
takes a global maximum and a minimum. From Fermat we know that any local (and in particular any global) extremum occurs at a critical points — a point where
, or
has no derivative at all. But once we find these critical points how can we tell maxima from minima?
The biggest value of at a critical point is clearly the global maximum, and the smallest is just as clearly the minimum. But what about all the ones in between? Here’s where those consequences of the mean value theorem come in handy. For simplicity, let’s assume that the critical points are isolated. That is, each one has a neighborhood in which it’s the only critical point. Further, let’s assume that
is continuous wherever it exists.
Now, to the left of any critical point we’ll have a stretch where is differentiable (or else there would be another critical point there) and
is nonzero (ditto). Since the derivative is continuous, it must either be always positive or always negative on this stretch, because if it was sometimes positive and sometimes negative the intermediate value theorem would give us a point where it’s zero. If the derivative is positive, our corollaries of the mean value theorem tell us that
increases as we move in towards the point, while if the derivative is negative it decreases into the critical point. Similarly, on the right we’ll have another such stretch telling us that
either increases or decreases as we move away from the critical point.
So what’s a local maximum? It’s a critical point where the function increases moving into the critical point and decreases moving away! That is, if near the critical point the derivative is positive on the left and negative on the right, we’ve got ourselves a local maximum. If the derivative is positive on the right and negative on the left, it’s a local minimum. And if we find the same sign on either side, it’s neither! Notice that this is exactly what happens with the function at its critical point. Also, we don’t have to worry about where to test the sight of the derivative, because we know that it can only change signs at a critical point.
In fact, if we add a bit more to our assumptions we can get an even nicer test. Let’s assume that the function is “twice-differentiable” — that is itself a differentiable function — on our interval. Then all the critical points happen where
. Even better now, if it changes signs as we pass through the critical point (indicating a local extremum) it’s either increasing or decreasing, and this will be reflected in its derivative
at the critical point. If
then our sign changes from negative to positive and we must be looking at a local minimum. On the other hand, if
then we’ve got a local maximum. Unfortunately, if
we don’t really get any information from this test and we have to fall back on the previous one.
Consequences of the Mean Value Theorem
So now that we have the mean value theorem what can we do with it? First off, we can tell something that seems intuitively obvious. We know that a constant function has the constant zero function as its derivative. It turns out that these are the only functions with zero derivative.
To see this, let be a differentiable function on
so that
for all
. Let
and
be any points between
and
with
. Then
restricts to a continuous function on the interval
which is differentiable on the interior
. The differentiable mean value theorem then applies, and it tells us that there is some
with
. But by assumption this derivative is zero, and so
. Since the points were arbitrary,
takes the same value at each point in
.
What about a function for which
on some interval
? Looking at the graph it seems that the slope of all the tangent lines should be positive, and so the function should be increasing. Indeed this is the case.
Specifically we have to show that if for two points in
then
. Again we look at the restriction of
to a continuous function on
which is differentiable on
. Then the mean value theorem tells us that there is some
with
. By assumption this quantity is positive, as is
, and so
. Similarly we can show that if
on an interval
then the function is decreasing there.
A note on the Periodic Functions Problem
Over at The Everything Seminar, Jim Belk mentions an interesting little problem.
Show that there exist two periodic functions
whose sum is the identity function:
for all
He notes right off that, “Obviously the functions and
can’t be continuous, since any continuous periodic function is bounded.” I’d like to explain why, in case you didn’t follow that.
If a function is periodic, that means it factors through a map to the circle, which we call
. Why? Because “periodic” with period
means we can take the interval
and glue one end to the other to make a circle. As we walk along the real line we walk around the circle. When we come to the end of a period in the line, that’s like getting back to where we started on the circle. Really what we’re doing is specifying a function on the circle and then using that function over and over again to give us a function on the real line. And if
is going to be continuous, the function
had better be as well.
Now, I assert that the circle is compact. I could do a messy proof inside the circle itself (and I probably should in the long run) but for now we can just see the circle lying in the plane as the collection of points distance
from the origin. Then this subspace of the plane is clearly bounded, and it’s not hard to show that it’s closed. The Heine-Borel theorem tells us that it’s compact!
And now since the circle is compact we know that its image under the continuous map must be compact as well! And since the image of
is the same as the image of
, it must also be a compact subspace of
— a closed, bounded interval. Neat.
The Differential Mean Value Theorem
Let’s say we’ve got a function that’s continuous on the closed interval
and differentiable on
. We don’t even assume the function is defined outside the interval, so we can’t really set up the limit for differentiability at the endpoints, but they don’t matter much in the end.
Anyhow, if we look at the graph of we could just draw a straight line from the point
to the point
. The graph itself wanders away from this line and back, but the line tells us that on average we’re moving from
to
at a certain rate — the slope of the line. Since this is an average behavior, sometimes we must be going faster and sometimes slower. The differential mean value theorem says that there’s at least one point where we’re going exactly that fast. Geometrically, this means that the tangent line will be parallel to the secant we drew between the endpoints. In formulas we say there is a point
with
.
First let’s nail down a special case, called “Rolle’s theorem”. If , we’re asserting that there is some point
with
. Since
is compact and
is continuous, the extreme value theorem tells us that
must take a maximum and a minimum. If these are both zero, then we’re looking at the constant function
, and any point in the middle satisfies
. On the other hand, if either the maximum or minimum is nonzero, then we have a local extremum at a point
where
is differentiable (since it’s differentiable all through the open interval). Now Fermat’s theorem tells us that
since
is a local extremum! Thus Rolle’s theorem is proved.
Now for the general case. Start with the function and build from it the function
. On the graph, this corresponds to applying an “affine transformation” (which sends straight lines in the plane to other straight lines in the plane) to pull both
and
down to zero. In fact, it’s a straightforward calculation to see that
. Thus Rolle’s theorem applies and we find a point
with
. But applying our laws of differentiation, we see that
. And so
, as desired.
Fermat’s Theorem
Okay, the Heine-Borel theorem tells us that a continuous real-valued function on a compact space
takes a maximum and a minimum value. In particular, this holds for functions on closed intervals. But how can we recognize a maximum or a minimum when we see one?
First of all, what we get from the Heine-Borel theorem is a global maximum and minimum. That is, a point so that for any
we have
(or
). We also can consider “local” maxima and minima. As you might guess from local connectedness and local compactness, a local maximum (minimum)
is a global maximum (minimum) in some neighborhood
. For example, if
is a function on some region in
then having a local maximum at
means that there is some interval
with
, and for every
we have
.
So a function may have a number of local maxima and minima, but they’re not all global. Still, finding local maxima and minima is an important first step. In practice there’s only a finite number of them, and we can easily pick out which of them are global by just computing the function. So what do they look like?
For functions on regions in , the biggest part of the answer comes from Fermat’s theorem. The theorem itself actually talks about differentiable functions, so the first thing we’ll say is that an extremum may occur at a point where the function is not differentiable (though a point of nondifferentiability is not a sure sign of being an extremum).
Now, let’s say that we have a local maximum at and that
is differentiable at
. We can set up the difference quotient
. When we take our limit as
goes to
, we can restrict to the neighborhood where
gives a global maximum, so
. To the right of
,
, so the difference quotient is negative here. To the left of
,
, so the difference quotient is positive here. Then since the limit must be a limit point of both of these regions, it must be
. That is,
. And the same thing happens for local minima.
So let’s define a “critical point” of a function to be one where either isn’t differentiable or
. Then any local extremum must happen at a critical point. But not every critical point is a local extremum. The easiest example is
, which has derivative
. Then the only critical point is
, for which
, but any neighborhood of
has both positive and negative values of
, so it’s not a local maximum or minimum.
Geometrically, we should have expected as much as this. Remember that the derivative is the slope of the tangent line. At a local maximum, the function rises to the crest and falls again, and at the top the tangent line balances perfectly level with zero slope. We can see this when we draw the graph, and it provides the intuition behind Fermat’s theorem, but to speak with certainly we need the analytic definitions and the proof of the theorem.