The Unapologetic Mathematician

Mathematics for the interested outsider

Some theorems about metric spaces

We need to get down a few facts about metric spaces before we can continue on our course. Firstly, as I alluded in an earlier comment, compact metric spaces are sequentially compact — every sequence has a convergent subsequence.

To see this fact, we’ll use the fact that compact spaces are the next best thing to finite. Specifically, in a finite set any infinite sequence would have to hit one point infinitely often. Here instead, we’ll have an accumulation point \xi in our compact metric space X so that for any \epsilon>0 and point x_m in our sequence there is some n\geq m with d_X(x_n,\xi)<\epsilon. That is, though the sequence may move away from \xi, it always comes back within \epsilon of it again. Once we have an accumulation point \xi, we can find a subsequence converging to \xi just as we found a subnet converging to any accumulation point of a net.

Let’s take our sequence and define F_N=\mathrm{Cl}(\{x_n, n\geq N\}) — the closure of the sequence from x_N onwards. Then these closed sets are nested F_1\supseteq F_2\supseteq...\supseteq F_N\supseteq..., and the intersection of any finite number of them is the smallest one, which is clearly nonempty since it contains a tail of the sequence. Then by the compactness of X we see that the intersection of all the F_N is again nonempty. Since the points in this intersection are in the closure of any tail of the sequence, they must be accumulation points.

Okay, that doesn’t quite work. See the comments for more details. Michael asks where I use the fact that we’re in a metric space, which was very astute. It turns out on reflection that I did use it, but it was hidden.

We can still say we’re looking for an accumulation point first and foremost, because if the sequence has an accumulation point there must be some subsequence converging to that point. Why not a subnet in general? Because metric spaces must be normal Hausdorff (using metric neighborhoods to separate
closed sets) and first-countable! And as long as we’re first-countable (or, weaker, “sequential”) we can find a sequence converging to any limit point of a net.

What I didn’t say before is that once we find an accumulation point there will be a subsequence converging to that point. My counterexample is compact, and any sequence in it has accumulation points, but we will only be able to find subnets of our sequence converging to them, not subsequences. Unless we add something to assure that our space is sequential, and metric spaces do that.

We should note in passing that the special case where X is a compact subspace of \mathbb{R}^n is referred to as the Bolzano-Weierstrass Theorem.

Next is the Heine-Cantor theorem, which says that any continuous function f:M\rightarrow N from a compact metric space M to any metric space N is uniformly continuous. In particular, we can use the interval \left[a,b\right] as our compact metric space M and the real numbers \mathbb{R} as our metric space N to see that any continuous function on a closed interval is uniformly continuous.

So let’s assume that f is continuous but not uniformly continuous. Then there is some \epsilon>0 so that for any \delta>0 there are points x and y in M with d_M(x,y)<\delta but d_N(f(x),f(y))\geq\epsilon. In particular, we can pick \frac{1}{n} as our \delta and get two sequences x_n and y_n with d_M(x_n,y_n)<\frac{1}{n} but d_N(f(x),f(y))\geq\epsilon. By the above theorem we can find subsequences x_{n_k} converging to \bar{x} and y_{n_k} converging to \bar{y}.

Now d_X(x_{n_k},y_{n_k})<\frac{1}{n_k}, which converges to {0}, and so \bar{x}=\bar{y}. Therefore we must have d_Y(f(x_{n_k}),f(y_{n_k}) also converging to {0} by the continuity of f. But this can’t happen, since each of these distances must be at least \epsilon! Thus f must have been uniformly continuous to begin with.

January 31, 2008 Posted by | Point-Set Topology, Topology | 10 Comments

Darboux Integration

Okay, defining the integral as the limit of a net of Riemann sums is all well and good, but it’s a huge net, and it seems impossible to calculate with. We need a better way of getting a handle on these things. What we’ll use is a little trick for evaluating limits of nets that I haven’t mentioned yet: “cofinal sets”.

Given a directed set (D,\preceq), a directed subset S is cofinal if for every d\in D there is some s\in S with s\succeq d. Now watch what happens when we try to show that the limit of a net x_d is a point x. We need to find for every neighborhood U of x an index d_0 so that for every d\succeq d_0 we have x_d\in U. But if d_0 is such an index, then there is some s_0\in S above it, and every s\in S above that is also above d_0, and so x_s\in U. That is, if the limit over D exists, then the limit over S exists and has the same value.

Let’s give a cofinal set of tagged partitions by giving a rule for picking the tags that go with any partition. Then our net consists just of partitions of the interval \left[a,b\right], and the tags come for free. If the function f is Riemann-integrable, then the limit over this cofinal set will be the integral. Here’s our rule: in the closed subinterval \left[x_{i-1},x_i\right] pick a point t_i so that \lim\limits_{x\rightarrow t_i}f(x) is the supremum of the values of f in that subinterval. If the function is continuous it will attain a maximum at our tag, and if not it’ll get close or shoot off to infinity (if there is no supremum).

Why is this cofinal? Let’s imagine a tagged partition x=((x_0,...,x_n),(t_1,...,t_n)) where t_i is not chosen according to this rule. Then we can refine the partition by splitting up the ith strip in such a way that t_i is the maximum in one of the new strips, and choosing all the new tags according to the rule. Then we’ve found a good partition above the one we started with. Similarly, we can build another cofinal set by always choosing the tags where f approaches an infimum.

When we consider a partition x in the first cofinal set we can set up something closely related to the Riemann sums: the “upper Darboux sums”

\displaystyle U_x(f)=\sum\limits_{i=1}^n M_i(x_i-x_{i-1})

where M_i is the supremum of f(x) on the interval \left[x_{i-1},x_i\right], or infinity if the value of f is unbounded above here. Similarly, we can define the “lower Darboux sum”

\displaystyle L_x(f)=\sum\limits_{i=1}^n m_i(x_i-x_{i-1})

where now m_i is the infimum (or negative infinity). If the function is Riemann-integrable, then the limits over these cofinal sets both exist and are both equal to the Riemann integral. So we define a function to be “Darboux-integrable” if the limits of the upper and lower Darboux sums both exist and have the same value. Then the Darboux integral is defined to be this common value. Notice that if the function ever shoots off to positive or negative infinity we’ll get an infinite value for one of the terms, and we can never converge, so such functions are not Darboux-integrable.

We should notice here that given any partition x, the upper Darboux sum must be larger than any Riemann sum with that same partition, since no matter how we choose the tag t_i we’ll find that f(t_i)\leq M_i by definition. Similarly, the lower Darboux sum must be smaller than any Riemann sum on the same partition. Now let’s say that the upper and lower Darboux sums both converge to the same value s. Then given any neighborhood of s we can find a partition x_U so that every upper Darboux sum over a refinement of x_U is in the neighborhood, and a similar partition x_L for the lower Darboux sums. Choosing a common refinement x_R of both (which we can do because partitions form a directed set) both its upper and lower Darboux sums (and those of any of its refinements) will be in our neighborhood. Then we can choose any tags in x_R we want, and the Riemann sum will again be in the neighborhood. Thus a Darboux-integrable function is also Riemann-integrable.

So this new notion of Darboux-integrability is really the same one as Riemann-integrability, but it involves taking two limits over a much less complicated directed set. For now, we’ll just call a function which satisfies either of these two equivalent conditions “integrable” and be done with it, using whichever construction of the integral is most appropriate to our needs at the time.

January 30, 2008 Posted by | Analysis, Calculus, Orders | 3 Comments

Riemann Integration

Before continuing with methods of antidifferentiation, let’s consider another geometric problem: integration. Here’s an example:

An area to be integrated

We’ve got a function whose graph is drawn in red, and we want to find the area contained between the graph, the x-axis, and the two blue lines at x=3 and x=7. We’ll approximate this by cutting up this interval into n pieces and choosing a sample point t_i in each piece, like so:

Approximating the integral

Now we’ve just got a bunch of rectangles, and we can add up their areas to get


where f(x_i) is the value of the function at the ith sample point, and \Delta_i is the width of the ith strip. Now as we cut the strips thinner and thinner, our stairstep-like approximation to the function should get closer and closer to the real function, and our approximation to the area we’re interested in should get better and better.

So how can we formalize this process? First, let’s take an interval \left[a,b\right] and think about how to cut it up the strips. We do this by picking a collection of points a=x_0<x_1<...<x_{n-1}<x_n=b. We get a bunch of smaller intervals \left[x_{i-1},x_i\right], and in each one we pick some t_i. This structure we call a “tagged partition” of the interval \left[a,b\right]. We define the “mesh” of a partition to be its thickest subinterval, \max\limits_{1\leq i\leq n}(x_i-x_{i-1}), and we’ll want to somehow take this down to zero.

We can now see that the collection of all the tagged partitions of an interval form a directed set! We say that a tagged partition y=((y_0,...,y_m),(s_1,...,s_m)) is a “refinement” of a tagged partition x=((x_0,...,x_n),(t_1,...,t_n)) if every partition point x_i is one of the y_j, and every tag t_i is one of the s_j. That is, we get from x to y by splitting up some of the slices of x and adding new tags to the new slices. Then we define x\preceq y if y is a refinement of x. This makes the collection of tagged partitions into a partially-ordered set.

To show that this is a directed set, consider any two tagged partitions x=((x_0,...,x_n),(t_1,...,t_n)) and y=((y_0,...,y_m),(s_1,...,s_m)), and make a new partition by using all the partition points from each one. Now look at each slice in the new partition. It can’t have more than one t tag or s tag, so it has either zero, one, or two distinct tags. If it has no tags, add one. If it has one tag, do nothing. If it has two distinct tags, split it between them (notice how we’re using the topology of \mathbb{R} to say we can make this split). At the end, we’ve got a new partition that refines both of x and y. And thus we have a directed set.

Now if we have a function f on \left[a,b\right], we can get a net on this directed set. Given any tagged partition x=((x_0,...,x_n),(t_1,...,t_n)), we define the “Riemann sum”

\displaystyle f_x=\sum\limits_{i=1}^nf(t_i)(x_i-x_{i-1})

Finally, we say that the function f is “Riemann integrable” if this net converges to a limit s, and in this case we define the “Riemann integral” of f:

\displaystyle\int\limits_a^b f(x)dx=s

which is, at last, the area under the curve as we set out to find.

January 29, 2008 Posted by | Analysis, Calculus | 17 Comments


One of the consequences of the mean value theorem we worked out was that two differentiable functions f and g on an interval (a,b) differ by a constant if and only if their derivatives are the same: f'(x)=g'(x) for all x\in(a,b). Now let’s turn this around the other way.

We start with a function f on an interval (a,b) and define an “antiderivative” of f to be a function F on the same interval such that F'(x)=f(x) for x\in(a,b). What the above conclusion from the mean value theorem shows us is that there’s only one way any two solutions could differ. That is if F is some particular antiderivative of f then any other antiderivative G satisfies G(x)=F(x)+C for some real constant C. So the hard bit about antiderivatives is all in finding a particular one, since the general solution to an antidifferentiation problem just involves adding an arbitrary constant corresponding to the constant we lose when we differentiate.

Some antiderivatives we can pull out right away. We know that if F(x)=x^n then F'(x)=nx^{n-1}. Thus, turning this around, we find an antiderivative of f(x)=x^n=\frac{x^{n+1}}{n+1}, except if n=-1, because then we’ll have to divide by zero. We’ll figure out what to do with this exception later.

We can also turn around some differentiation rules. For instance, since \frac{d}{dx}\left[f(x)+g(x)\right]=f'(x)+g'(x) then if F is an antiderivative of a function f and G an antiderivative of g then F+G is an antiderivative of f+g. Similarly, the differentiation rule for a constant multiple tells us that cF is an antiderivative of cf for any real constant c.

Between these we can handle antidifferentiation of any polynomial P(x). Each term of the polynomial is some constant times a power of x, so the constant multiple rule and the rule for powers of x gives us an antiderivative for each term. Then we can just add these antiderivatives all together. We also only have one arbitrary constant to add since we can just add together the constants for each term to get one overall constant for the whole polynomial.

January 28, 2008 Posted by | Analysis, Calculus | 5 Comments

Linking Integrals

There’s a new paper out on the arXiv discussing higher-dimensional linking integrals, by two graduate students at the University of Pennsylvania. I don’t have time to really go through it right now, but at a first scan I’m really not sure what they’ve done here. It seems they’re just taking the regular Gauss integral and doing the exact same thing for higher-dimensional spheres, although in a way that’s so loaded down with notation that it obscures the fact that it’s the exact same idea.

Some people like results that are more computationally focused, and some (like me) prefer to lay bare the structure of the concepts, and derive a computational framework later. It may be that these authors are just more the former than the latter. Anyhow, I’m not certain how original it is, but my own work is shot through with “wait, you mean nobody’s written that up yet?” If they’ve found one of these obvious niches that nobody has gotten around to mining, more power to them.

January 28, 2008 Posted by | Knot theory | 13 Comments

Distinguishing Maxima and Minima

From Heine-Borel we know that a continuous function f on a closed interval \left[a,b\right] takes a global maximum and a minimum. From Fermat we know that any local (and in particular any global) extremum occurs at a critical points — a point where f'(x)=0, or f has no derivative at all. But once we find these critical points how can we tell maxima from minima?

The biggest value of f at a critical point is clearly the global maximum, and the smallest is just as clearly the minimum. But what about all the ones in between? Here’s where those consequences of the mean value theorem come in handy. For simplicity, let’s assume that the critical points are isolated. That is, each one has a neighborhood in which it’s the only critical point. Further, let’s assume that f'(x) is continuous wherever it exists.

Now, to the left of any critical point we’ll have a stretch where f is differentiable (or else there would be another critical point there) and f'(x) is nonzero (ditto). Since the derivative is continuous, it must either be always positive or always negative on this stretch, because if it was sometimes positive and sometimes negative the intermediate value theorem would give us a point where it’s zero. If the derivative is positive, our corollaries of the mean value theorem tell us that f increases as we move in towards the point, while if the derivative is negative it decreases into the critical point. Similarly, on the right we’ll have another such stretch telling us that f either increases or decreases as we move away from the critical point.

So what’s a local maximum? It’s a critical point where the function increases moving into the critical point and decreases moving away! That is, if near the critical point the derivative is positive on the left and negative on the right, we’ve got ourselves a local maximum. If the derivative is positive on the right and negative on the left, it’s a local minimum. And if we find the same sign on either side, it’s neither! Notice that this is exactly what happens with the function f(x)=x^3 at its critical point. Also, we don’t have to worry about where to test the sight of the derivative, because we know that it can only change signs at a critical point.

In fact, if we add a bit more to our assumptions we can get an even nicer test. Let’s assume that the function is “twice-differentiable” — that f'(x) is itself a differentiable function — on our interval. Then all the critical points happen where f'(x)=0. Even better now, if it changes signs as we pass through the critical point (indicating a local extremum) it’s either increasing or decreasing, and this will be reflected in its derivative f''(x) at the critical point. If f''(x)>0 then our sign changes from negative to positive and we must be looking at a local minimum. On the other hand, if f''(x)<0 then we’ve got a local maximum. Unfortunately, if f''(x)=0 we don’t really get any information from this test and we have to fall back on the previous one.

January 25, 2008 Posted by | Analysis, Calculus | 2 Comments

Consequences of the Mean Value Theorem

So now that we have the mean value theorem what can we do with it? First off, we can tell something that seems intuitively obvious. We know that a constant function has the constant zero function as its derivative. It turns out that these are the only functions with zero derivative.

To see this, let f be a differentiable function on (a,b) so that f'(x)=0 for all x\in(a,b). Let x_1 and x_2 be any points between a and b with x_1<x_2. Then f restricts to a continuous function on the interval \left[x_1,x_2\right] which is differentiable on the interior (x_1,x_2). The differentiable mean value theorem then applies, and it tells us that there is some c\in(x_1,x_2) with f'(c)=\frac{f(x_2)-f(x_1)}{x_2-x_1}. But by assumption this derivative is zero, and so f(x_2)=f(x_1). Since the points were arbitrary, f takes the same value at each point in (a,b).

What about a function f for which f'(x)>0 on some interval (a,b)? Looking at the graph it seems that the slope of all the tangent lines should be positive, and so the function should be increasing. Indeed this is the case.

Specifically we have to show that if x_2>x_1 for two points in (a,b) then f(x_2)>f(x_1). Again we look at the restriction of f to a continuous function on \left[x_1,x_2\right] which is differentiable on (x_1,x_2). Then the mean value theorem tells us that there is some c\in(x_1,x_2)\subseteq(a,b) with f'(x)=\frac{f(x_2)-f(x_1)}{x_2-x_1}. By assumption this quantity is positive, as is x_2-x_1, and so f(x_2)>f(x_1). Similarly we can show that if f'(x)<0 on an interval (a,b) then the function is decreasing there.

January 24, 2008 Posted by | Analysis, Calculus | 14 Comments

A note on the Periodic Functions Problem

Over at The Everything Seminar, Jim Belk mentions an interesting little problem.

Show that there exist two periodic functions f,g:\mathbb{R}\rightarrow\mathbb{R} whose sum is the identity function:

f(x)+g(x)=x for all x\in\mathbb{R}

He notes right off that, “Obviously the functions f and g can’t be continuous, since any continuous periodic function is bounded.” I’d like to explain why, in case you didn’t follow that.

If a function f is periodic, that means it factors through a map to the circle, which we call S^1. Why? Because “periodic” with period p means we can take the interval \left[0,p\right) and glue one end to the other to make a circle. As we walk along the real line we walk around the circle. When we come to the end of a period in the line, that’s like getting back to where we started on the circle. Really what we’re doing is specifying a function on the circle and then using that function over and over again to give us a function on the real line. And if f is going to be continuous, the function \bar{f}:S^1\rightarrow\mathbb{R} had better be as well.

Now, I assert that the circle is compact. I could do a messy proof inside the circle itself (and I probably should in the long run) but for now we can just see the circle lying in the plane \mathbb{R}^2 as the collection of points distance 1 from the origin. Then this subspace of the plane is clearly bounded, and it’s not hard to show that it’s closed. The Heine-Borel theorem tells us that it’s compact!

And now since the circle is compact we know that its image under the continuous map \bar{f} must be compact as well! And since the image of f is the same as the image of \bar{f}, it must also be a compact subspace of \mathbb{R} — a closed, bounded interval. Neat.

January 23, 2008 Posted by | Analysis, Calculus | 13 Comments

The Differential Mean Value Theorem

Let’s say we’ve got a function f that’s continuous on the closed interval \left[a,b\right] and differentiable on (a,b). We don’t even assume the function is defined outside the interval, so we can’t really set up the limit for differentiability at the endpoints, but they don’t matter much in the end.

Anyhow, if we look at the graph of f we could just draw a straight line from the point (a,f(a)) to the point (b,f(b)). The graph itself wanders away from this line and back, but the line tells us that on average we’re moving from f(a) to f(b) at a certain rate — the slope of the line. Since this is an average behavior, sometimes we must be going faster and sometimes slower. The differential mean value theorem says that there’s at least one point where we’re going exactly that fast. Geometrically, this means that the tangent line will be parallel to the secant we drew between the endpoints. In formulas we say there is a point c\in(a,b) with f'(c)=\frac{f(b)-f(a)}{b-a}.

First let’s nail down a special case, called “Rolle’s theorem”. If f(a)=0=f(b), we’re asserting that there is some point c\in(a,b) with f'(c)=0. Since \left[a,b\right] is compact and f is continuous, the extreme value theorem tells us that f must take a maximum and a minimum. If these are both zero, then we’re looking at the constant function f(x)=0, and any point in the middle satisfies f'(c)=0. On the other hand, if either the maximum or minimum is nonzero, then we have a local extremum at a point c\in(a,b) where f is differentiable (since it’s differentiable all through the open interval). Now Fermat’s theorem tells us that f'(c)=0 since c is a local extremum! Thus Rolle’s theorem is proved.

Now for the general case. Start with the function f and build from it the function g(x)=f(x)-\frac{f(b)-f(a)}{b-a}(x-a)-f(a). On the graph, this corresponds to applying an “affine transformation” (which sends straight lines in the plane to other straight lines in the plane) to pull both f(a) and f(b) down to zero. In fact, it’s a straightforward calculation to see that g(a)=0=g(b). Thus Rolle’s theorem applies and we find a point c with g'(c)=0. But applying our laws of differentiation, we see that g'(c)=f'(c)-\frac{f(b)-f(a)}{b-a}. And so f'(c)=\frac{f(b)-f(a)}{b-a}, as desired.

January 22, 2008 Posted by | Analysis, Calculus | 22 Comments

Fermat’s Theorem

Okay, the Heine-Borel theorem tells us that a continuous real-valued function f on a compact space X takes a maximum and a minimum value. In particular, this holds for functions on closed intervals. But how can we recognize a maximum or a minimum when we see one?

First of all, what we get from the Heine-Borel theorem is a global maximum and minimum. That is, a point c\in X so that for any x\in X we have f(c)\geq f(x) (or f(c)\leq f(x)). We also can consider “local” maxima and minima. As you might guess from local connectedness and local compactness, a local maximum (minimum) c is a global maximum (minimum) in some neighborhood U\in\mathcal{N}(c). For example, if f is a function on some region in \mathbb{R} then having a local maximum at c means that there is some interval (a,b) with a<c<b, and for every x\in(a,b) we have f(c)\geq f(x).

So a function may have a number of local maxima and minima, but they’re not all global. Still, finding local maxima and minima is an important first step. In practice there’s only a finite number of them, and we can easily pick out which of them are global by just computing the function. So what do they look like?

For functions on regions in \mathbb{R}, the biggest part of the answer comes from Fermat’s theorem. The theorem itself actually talks about differentiable functions, so the first thing we’ll say is that an extremum may occur at a point where the function is not differentiable (though a point of nondifferentiability is not a sure sign of being an extremum).

Now, let’s say that we have a local maximum at c and that f is differentiable at c. We can set up the difference quotient \frac{f(x)-f(c)}{x-c}. When we take our limit as x goes to c, we can restrict to the neighborhood where c gives a global maximum, so f(x)-f(c)\leq0. To the right of c, x-c>0, so the difference quotient is negative here. To the left of c, x-c<0, so the difference quotient is positive here. Then since the limit must be a limit point of both of these regions, it must be {0}. That is, f'(c)=0. And the same thing happens for local minima.

So let’s define a “critical point” of a function to be one where either f isn’t differentiable or f'(c)=0. Then any local extremum must happen at a critical point. But not every critical point is a local extremum. The easiest example is f(x)=x^3, which has derivative f'(x)=3x^2. Then the only critical point is x=0, for which f(x)=0, but any neighborhood of x=0 has both positive and negative values of f(x), so it’s not a local maximum or minimum.

Geometrically, we should have expected as much as this. Remember that the derivative is the slope of the tangent line. At a local maximum, the function rises to the crest and falls again, and at the top the tangent line balances perfectly level with zero slope. We can see this when we draw the graph, and it provides the intuition behind Fermat’s theorem, but to speak with certainly we need the analytic definitions and the proof of the theorem.

January 21, 2008 Posted by | Analysis, Calculus | 7 Comments