The Unapologetic Mathematician

Mathematics for the interested outsider

Some theorems about metric spaces

We need to get down a few facts about metric spaces before we can continue on our course. Firstly, as I alluded in an earlier comment, compact metric spaces are sequentially compact — every sequence has a convergent subsequence.

To see this fact, we’ll use the fact that compact spaces are the next best thing to finite. Specifically, in a finite set any infinite sequence would have to hit one point infinitely often. Here instead, we’ll have an accumulation point \xi in our compact metric space X so that for any \epsilon>0 and point x_m in our sequence there is some n\geq m with d_X(x_n,\xi)<\epsilon. That is, though the sequence may move away from \xi, it always comes back within \epsilon of it again. Once we have an accumulation point \xi, we can find a subsequence converging to \xi just as we found a subnet converging to any accumulation point of a net.

Let’s take our sequence and define F_N=\mathrm{Cl}(\{x_n, n\geq N\}) — the closure of the sequence from x_N onwards. Then these closed sets are nested F_1\supseteq F_2\supseteq...\supseteq F_N\supseteq..., and the intersection of any finite number of them is the smallest one, which is clearly nonempty since it contains a tail of the sequence. Then by the compactness of X we see that the intersection of all the F_N is again nonempty. Since the points in this intersection are in the closure of any tail of the sequence, they must be accumulation points.

Okay, that doesn’t quite work. See the comments for more details. Michael asks where I use the fact that we’re in a metric space, which was very astute. It turns out on reflection that I did use it, but it was hidden.

We can still say we’re looking for an accumulation point first and foremost, because if the sequence has an accumulation point there must be some subsequence converging to that point. Why not a subnet in general? Because metric spaces must be normal Hausdorff (using metric neighborhoods to separate
closed sets) and first-countable! And as long as we’re first-countable (or, weaker, “sequential”) we can find a sequence converging to any limit point of a net.

What I didn’t say before is that once we find an accumulation point there will be a subsequence converging to that point. My counterexample is compact, and any sequence in it has accumulation points, but we will only be able to find subnets of our sequence converging to them, not subsequences. Unless we add something to assure that our space is sequential, and metric spaces do that.

We should note in passing that the special case where X is a compact subspace of \mathbb{R}^n is referred to as the Bolzano-Weierstrass Theorem.

Next is the Heine-Cantor theorem, which says that any continuous function f:M\rightarrow N from a compact metric space M to any metric space N is uniformly continuous. In particular, we can use the interval \left[a,b\right] as our compact metric space M and the real numbers \mathbb{R} as our metric space N to see that any continuous function on a closed interval is uniformly continuous.

So let’s assume that f is continuous but not uniformly continuous. Then there is some \epsilon>0 so that for any \delta>0 there are points x and y in M with d_M(x,y)<\delta but d_N(f(x),f(y))\geq\epsilon. In particular, we can pick \frac{1}{n} as our \delta and get two sequences x_n and y_n with d_M(x_n,y_n)<\frac{1}{n} but d_N(f(x),f(y))\geq\epsilon. By the above theorem we can find subsequences x_{n_k} converging to \bar{x} and y_{n_k} converging to \bar{y}.

Now d_X(x_{n_k},y_{n_k})<\frac{1}{n_k}, which converges to {0}, and so \bar{x}=\bar{y}. Therefore we must have d_Y(f(x_{n_k}),f(y_{n_k}) also converging to {0} by the continuity of f. But this can’t happen, since each of these distances must be at least \epsilon! Thus f must have been uniformly continuous to begin with.

January 31, 2008 Posted by John Armstrong | Point-Set Topology, Topology | | 7 Comments

Darboux Integration

Okay, defining the integral as the limit of a net of Riemann sums is all well and good, but it’s a huge net, and it seems impossible to calculate with. We need a better way of getting a handle on these things. What we’ll use is a little trick for evaluating limits of nets that I haven’t mentioned yet: “cofinal sets”.

Given a directed set (D,\preceq), a directed subset S is cofinal if for every d\in D there is some s\in S with s\succeq d. Now watch what happens when we try to show that the limit of a net x_d is a point x. We need to find for every neighborhood U of x an index d_0 so that for every d\succeq d_0 we have x_d\in U. But if d_0 is such an index, then there is some s_0\in S above it, and every s\in S above that is also above d_0, and so x_s\in U. That is, if the limit over D exists, then the limit over S exists and has the same value.

Let’s give a cofinal set of tagged partitions by giving a rule for picking the tags that go with any partition. Then our net consists just of partitions of the interval \left[a,b\right], and the tags come for free. If the function f is Riemann-integrable, then the limit over this cofinal set will be the integral. Here’s our rule: in the closed subinterval \left[x_{i-1},x_i\right] pick a point t_i so that \lim\limits_{x\rightarrow t_i}f(x) is the supremum of the values of f in that subinterval. If the function is continuous it will attain a maximum at our tag, and if not it’ll get close or shoot off to infinity (if there is no supremum).

Why is this cofinal? Let’s imagine a tagged partition x=((x_0,...,x_n),(t_1,...,t_n)) where t_i is not chosen according to this rule. Then we can refine the partition by splitting up the ith strip in such a way that t_i is the maximum in one of the new strips, and choosing all the new tags according to the rule. Then we’ve found a good partition above the one we started with. Similarly, we can build another cofinal set by always choosing the tags where f approaches an infimum.

When we consider a partition x in the first cofinal set we can set up something closely related to the Riemann sums: the “upper Darboux sums”

\displaystyle U_x(f)=\sum\limits_{i=1}^n M_i(x_i-x_{i-1})

where M_i is the supremum of f(x) on the interval \left[x_{i-1},x_i\right], or infinity if the value of f is unbounded above here. Similarly, we can define the “lower Darboux sum”

\displaystyle L_x(f)=\sum\limits_{i=1}^n m_i(x_i-x_{i-1})

where now m_i is the infimum (or negative infinity). If the function is Riemann-integrable, then the limits over these cofinal sets both exist and are both equal to the Riemann integral. So we define a function to be “Darboux-integrable” if the limits of the upper and lower Darboux sums both exist and have the same value. Then the Darboux integral is defined to be this common value. Notice that if the function ever shoots off to positive or negative infinity we’ll get an infinite value for one of the terms, and we can never converge, so such functions are not Darboux-integrable.

We should notice here that given any partition x, the upper Darboux sum must be larger than any Riemann sum with that same partition, since no matter how we choose the tag t_i we’ll find that f(t_i)\leq M_i by definition. Similarly, the lower Darboux sum must be smaller than any Riemann sum on the same partition. Now let’s say that the upper and lower Darboux sums both converge to the same value s. Then given any neighborhood of s we can find a partition x_U so that every upper Darboux sum over a refinement of x_U is in the neighborhood, and a similar partition x_L for the lower Darboux sums. Choosing a common refinement x_R of both (which we can do because partitions form a directed set) both its upper and lower Darboux sums (and those of any of its refinements) will be in our neighborhood. Then we can choose any tags in x_R we want, and the Riemann sum will again be in the neighborhood. Thus a Darboux-integrable function is also Riemann-integrable.

So this new notion of Darboux-integrability is really the same one as Riemann-integrability, but it involves taking two limits over a much less complicated directed set. For now, we’ll just call a function which satisfies either of these two equivalent conditions “integrable” and be done with it, using whichever construction of the integral is most appropriate to our needs at the time.

January 30, 2008 Posted by John Armstrong | Analysis, Calculus, Orders | | 2 Comments

Riemann Integration

Before continuing with methods of antidifferentiation, let’s consider another geometric problem: integration. Here’s an example:

An area to be integrated

We’ve got a function whose graph is drawn in red, and we want to find the area contained between the graph, the x-axis, and the two blue lines at x=3 and x=7. We’ll approximate this by cutting up this interval into n pieces and choosing a sample point t_i in each piece, like so:

Approximating the integral

Now we’ve just got a bunch of rectangles, and we can add up their areas to get

\displaystyle\sum\limits_{i=1}^nf(t_i)\Delta_i

where f(x_i) is the value of the function at the ith sample point, and \Delta_i is the width of the ith strip. Now as we cut the strips thinner and thinner, our stairstep-like approximation to the function should get closer and closer to the real function, and our approximation to the area we’re interested in should get better and better.

So how can we formalize this process? First, let’s take an interval \left[a,b\right] and think about how to cut it up the strips. We do this by picking a collection of points a=x_0<x_1<...<x_{n-1}<x_n=b. We get a bunch of smaller intervals \left[x_{i-1},x_i\right], and in each one we pick some t_i. This structure we call a “tagged partition” of the interval \left[a,b\right]. We define the “mesh” of a partition to be its thickest subinterval, \max\limits_{1\leq i\leq n}(x_i-x_{i-1}), and we’ll want to somehow take this down to zero.

We can now see that the collection of all the tagged partitions of an interval form a directed set! We say that a tagged partition y=((y_0,...,y_m),(s_1,...,s_m)) is a “refinement” of a tagged partition x=((x_0,...,x_n),(t_1,...,t_n)) if every partition point x_i is one of the y_j, and every tag t_i is one of the s_j. That is, we get from x to y by splitting up some of the slices of x and adding new tags to the new slices. Then we define x\preceq y if y is a refinement of x. This makes the collection of tagged partitions into a partially-ordered set.

To show that this is a directed set, consider any two tagged partitions x=((x_0,...,x_n),(t_1,...,t_n)) and y=((y_0,...,y_m),(s_1,...,s_m)), and make a new partition by using all the partition points from each one. Now look at each slice in the new partition. It can’t have more than one t tag or s tag, so it has either zero, one, or two distinct tags. If it has no tags, add one. If it has one tag, do nothing. If it has two distinct tags, split it between them (notice how we’re using the topology of \mathbb{R} to say we can make this split). At the end, we’ve got a new partition that refines both of x and y. And thus we have a directed set.

Now if we have a function f on \left[a,b\right], we can get a net on this directed set. Given any tagged partition x=((x_0,...,x_n),(t_1,...,t_n)), we define the “Riemann sum”

\displaystyle f_x=\sum\limits_{i=1}^nf(t_i)(x_i-x_{i-1})

Finally, we say that the function f is “Riemann integrable” if this net converges to a limit s, and in this case we define the “Riemann integral” of f:

\displaystyle\int\limits_a^b f(x)dx=s

which is, at last, the area under the curve as we set out to find.

January 29, 2008 Posted by John Armstrong | Analysis, Calculus | | 9 Comments

Antiderivatives

One of the consequences of the mean value theorem we worked out was that two differentiable functions f and g on an interval (a,b) differ by a constant if and only if their derivatives are the same: f'(x)=g'(x) for all x\in(a,b). Now let’s turn this around the other way.

We start with a function f on an interval (a,b) and define an “antiderivative” of f to be a function F on the same interval such that F'(x)=f(x) for x\in(a,b). What the above conclusion from the mean value theorem shows us is that there’s only one way any two solutions could differ. That is if F is some particular antiderivative of f then any other antiderivative G satisfies G(x)=F(x)+C for some real constant C. So the hard bit about antiderivatives is all in finding a particular one, since the general solution to an antidifferentiation problem just involves adding an arbitrary constant corresponding to the constant we lose when we differentiate.

Some antiderivatives we can pull out right away. We know that if F(x)=x^n then F'(x)=nx^{n-1}. Thus, turning this around, we find an antiderivative of f(x)=x^n=\frac{x^{n+1}}{n+1}, except if n=-1, because then we’ll have to divide by zero. We’ll figure out what to do with this exception later.

We can also turn around some differentiation rules. For instance, since \frac{d}{dx}\left[f(x)+g(x)\right]=f'(x)+g'(x) then if F is an antiderivative of a function f and G an antiderivative of g then F+G is an antiderivative of f+g. Similarly, the differentiation rule for a constant multiple tells us that cF is an antiderivative of cf for any real constant c.

Between these we can handle antidifferentiation of any polynomial P(x). Each term of the polynomial is some constant times a power of x, so the constant multiple rule and the rule for powers of x gives us an antiderivative for each term. Then we can just add these antiderivatives all together. We also only have one arbitrary constant to add since we can just add together the constants for each term to get one overall constant for the whole polynomial.

January 28, 2008 Posted by John Armstrong | Analysis, Calculus | | 4 Comments

Linking Integrals

There’s a new paper out on the arXiv discussing higher-dimensional linking integrals, by two graduate students at the University of Pennsylvania. I don’t have time to really go through it right now, but at a first scan I’m really not sure what they’ve done here. It seems they’re just taking the regular Gauss integral and doing the exact same thing for higher-dimensional spheres, although in a way that’s so loaded down with notation that it obscures the fact that it’s the exact same idea.

Some people like results that are more computationally focused, and some (like me) prefer to lay bare the structure of the concepts, and derive a computational framework later. It may be that these authors are just more the former than the latter. Anyhow, I’m not certain how original it is, but my own work is shot through with “wait, you mean nobody’s written that up yet?” If they’ve found one of these obvious niches that nobody has gotten around to mining, more power to them.

January 28, 2008 Posted by John Armstrong | Knot theory | | 13 Comments

Sunday Samples 53

Those of you who have known me personally might know I’m one of those pariahs who smokes. Those of you in close contact with me know I’m quitting. Yes, the one opinion of Keith Olbermann’s that I held out on adopting finally falls. Over this weekend the median length of somatic addiction, but the social and ritualistic aspects linger on. Still, every day it’s just enough to get through today.

In honor of this, I’m going back to the band Lincoln and their one eponymous album for a song called “Straight”.
Read more »

January 28, 2008 Posted by John Armstrong | Sunday Samples | | 1 Comment

Carnival!

No, not the Carnival of Mathematics (though it is new today). The real thing!

I would be remiss in my duties if while spending a year at Tulane I didn’t even mention Carnival. I’m living a half-block from the area of St. Charles Avenue where all the uptown parades go, which makes it that much easier.

Tonight was the Krewe of Oshun, and it was supposed to be the Krewe of Pygmalion. The former has no official website, and the latter wimped out because of the rain, so neither one gets any link love.

Anyhow, lessons learned: I need a poncho if it’s raining ’cause trying to carry even a folded umbrella while catching throws is basically a lose-lose proposition. If the attendance hadn’t been sparse I would have been out of luck. As it was, I think I did relatively well…

Carnival Loot 1

A little stuffed donkey, a Krewe of Oshun cup, a plastic bracelet (purple, around my right wrist), and of course a metric buttload of beads, which will be gold when it comes to giving review sessions. The one throw I caught that surprised me, though, was this:

Carnival Loot 2

Yes, those are panties. A cheap, green thong with a gold Mardi Gras logo on them. Seriously.

January 26, 2008 Posted by John Armstrong | Uncategorized | | No Comments

Distinguishing Maxima and Minima

From Heine-Borel we know that a continuous function f on a closed interval \left[a,b\right] takes a global maximum and a minimum. From Fermat we know that any local (and in particular any global) extremum occurs at a critical points — a point where f'(x)=0, or f has no derivative at all. But once we find these critical points how can we tell maxima from minima?

The biggest value of f at a critical point is clearly the global maximum, and the smallest is just as clearly the minimum. But what about all the ones in between? Here’s where those consequences of the mean value theorem come in handy. For simplicity, let’s assume that the critical points are isolated. That is, each one has a neighborhood in which it’s the only critical point. Further, let’s assume that f'(x) is continuous wherever it exists.

Now, to the left of any critical point we’ll have a stretch where f is differentiable (or else there would be another critical point there) and f'(x) is nonzero (ditto). Since the derivative is continuous, it must either be always positive or always negative on this stretch, because if it was sometimes positive and sometimes negative the intermediate value theorem would give us a point where it’s zero. If the derivative is positive, our corollaries of the mean value theorem tell us that f increases as we move in towards the point, while if the derivative is negative it decreases into the critical point. Similarly, on the right we’ll have another such stretch telling us that f either increases or decreases as we move away from the critical point.

So what’s a local maximum? It’s a critical point where the function increases moving into the critical point and decreases moving away! That is, if near the critical point the derivative is positive on the left and negative on the right, we’ve got ourselves a local maximum. If the derivative is positive on the right and negative on the left, it’s a local minimum. And if we find the same sign on either side, it’s neither! Notice that this is exactly what happens with the function f(x)=x^3 at its critical point. Also, we don’t have to worry about where to test the sight of the derivative, because we know that it can only change signs at a critical point.

In fact, if we add a bit more to our assumptions we can get an even nicer test. Let’s assume that the function is “twice-differentiable” — that f'(x) is itself a differentiable function — on our interval. Then all the critical points happen where f'(x)=0. Even better now, if it changes signs as we pass through the critical point (indicating a local extremum) it’s either increasing or decreasing, and this will be reflected in its derivative f''(x) at the critical point. If f''(x)>0 then our sign changes from negative to positive and we must be looking at a local minimum. On the other hand, if f''(x)<0 then we’ve got a local maximum. Unfortunately, if f''(x)=0 we don’t really get any information from this test and we have to fall back on the previous one.

January 25, 2008 Posted by John Armstrong | Analysis, Calculus | | No Comments

Consequences of the Mean Value Theorem

So now that we have the mean value theorem what can we do with it? First off, we can tell something that seems intuitively obvious. We know that a constant function has the constant zero function as its derivative. It turns out that these are the only functions with zero derivative.

To see this, let f be a differentiable function on (a,b) so that f'(x)=0 for all x\in(a,b). Let x_1 and x_2 be any points between a and b with x_1<x_2. Then f restricts to a continuous function on the interval \left[x_1,x_2\right] which is differentiable on the interior (x_1,x_2). The differentiable mean value theorem then applies, and it tells us that there is some c\in(x_1,x_2) with f'(c)=\frac{f(x_2)-f(x_1)}{x_2-x_1}. But by assumption this derivative is zero, and so f(x_2)=f(x_1). Since the points were arbitrary, f takes the same value at each point in (a,b).

What about a function f for which f'(x)>0 on some interval (a,b)? Looking at the graph it seems that the slope of all the tangent lines should be positive, and so the function should be increasing. Indeed this is the case.

Specifically we have to show that if x_2>x_1 for two points in (a,b) then f(x_2)>f(x_1). Again we look at the restriction of f to a continuous function on \left[x_1,x_2\right] which is differentiable on (x_1,x_2). Then the mean value theorem tells us that there is some c\in(x_1,x_2)\subseteq(a,b) with f'(x)=\frac{f(x_2)-f(x_1)}{x_2-x_1}. By assumption this quantity is positive, as is x_2-x_1, and so f(x_2)>f(x_1). Similarly we can show that if f'(x)<0 on an interval (a,b) then the function is decreasing there.

January 24, 2008 Posted by John Armstrong | Analysis, Calculus | | 13 Comments

A note on the Periodic Functions Problem

Over at The Everything Seminar, Jim Belk mentions an interesting little problem.

Show that there exist two periodic functions f,g:\mathbb{R}\rightarrow\mathbb{R} whose sum is the identity function:

f(x)+g(x)=x for all x\in\mathbb{R}

He notes right off that, “Obviously the functions f and g can’t be continuous, since any continuous periodic function is bounded.” I’d like to explain why, in case you didn’t follow that.

If a function f is periodic, that means it factors through a map to the circle, which we call S^1. Why? Because “periodic” with period p means we can take the interval \left[0,p\right) and glue one end to the other to make a circle. As we walk along the real line we walk around the circle. When we come to the end of a period in the line, that’s like getting back to where we started on the circle. Really what we’re doing is specifying a function on the circle and then using that function over and over again to give us a function on the real line. And if f is going to be continuous, the function \bar{f}:S^1\rightarrow\mathbb{R} had better be as well.

Now, I assert that the circle is compact. I could do a messy proof inside the circle itself (and I probably should in the long run) but for now we can just see the circle lying in the plane \mathbb{R}^2 as the collection of points distance 1 from the origin. Then this subspace of the plane is clearly bounded, and it’s not hard to show that it’s closed. The Heine-Borel theorem tells us that it’s compact!

And now since the circle is compact we know that its image under the continuous map \bar{f} must be compact as well! And since the image of f is the same as the image of \bar{f}, it must also be a compact subspace of \mathbb{R} — a closed, bounded interval. Neat.

January 23, 2008 Posted by John Armstrong | Analysis, Calculus | | 13 Comments