# The Unapologetic Mathematician

## Baire Sets

Looking over my notes from topology it seems I completely skipped over Baire sets. This was always one of those annoying topics that I never had much use for, partly because I didn’t do point-set topology or analysis. Also, even in my day the usual approach was a very classical and awkward one. Today I’m going to do a much more modern and streamlined one, and I can motivate it better from a measure-theoretic context to boot!

Basically, the idea of a Baire set is one that can’t be filled up by “negligible” sets. We’ve used that term in measure theory to denote a subset of a set of measure zero. But in topology we don’t have a “measure” to work with. Instead, we use the idea of a closed “nowhere dense” set — one for which there is no open set on which it is dense. The original motivation was a set like a boundary of a region; in the context of Jordan content we saw that such a set was negligible.

Clearly such a set has no interior — no open set completely contained inside — and any finite union of them is still nowhere dense. However, if we add up countably infinitely many we might have enough points to be dense on some open set. However, we don’t want to be able to actually fill such an open set. In the measure-theoretic context, this corresponds to the way any countable union of negligible sets is still negligible.

So, let’s be more specific: a “Baire set” is one for which the interior of every countable union of closed, nowhere dense sets is empty. Equivalently, we can characterize Baire sets in complementary terms: every countable intersection of dense open sets is dense. We can also use the contrapositive of the original definition: if a countable union of closed sets has an interior point, then one of the sets must itself have an interior point.

We’re interested in part of the famous “Baire category theorem” — the name is an artifact of the old, awkward approach and has nothing to do with category theory — which tells us that every complete metric space $X$ is a Baire space. Let $\{U_n\}$ be a countable collection of open dense subsets of $X$. We will show that their intersection is dense by showing that any nonempty open set $W$ has some point $x$ — the same point — in common with all the $U_n$.

Okay, Since $U_1$ is dense, then $U_1\cap W$ is nonempty, and it contains a point $x_1$. As the intersection of two open sets, it’s open, and so it contains an open neighborhood of $x_1$ which set can take to be an open metric ball of radius $r_1>0$. But then $B(x_1,r_1)$ is an open set, which will intersect $U_2$. This process will continue, and for every $n$ we will find a point $x_n$ and a radius $r_n$ so that $B(x_n,r_n)\subseteq B(x_{n-1},r_{n-1})\cap U_n$. We can also at each step pick $r_n<\frac{1}{n}$.

And so we come up with a sequence of points $\{x_n\}$. At each step, the ball $B(x_n,r_n)$ contains the whole tail of the sequence past $x_n$, and so all of these points are within $r_n$ of each other. Since $r_n$ gets arbitrarily small, this shows that $x_n$ is Cauchy, and since $X$ is complete, the sequence must converge to a limit $x$. This point $x$ will be in each set $U_n$, since $x\in B(x_n,r_n)\subseteq U_n$, and it’s obviously in $W$, as desired.

The other part of the Baire category theorem says that any locally compact Hausdorff space is a Baire space. In this case the proof proceeds very similarly, but with the finite intersection property for compact spaces standing in for completeness.

August 13, 2010

## Multivariable Limits

As we’ve seen, when our target is a higher-dimensional real space continuity is the same as continuity in each component. But what about when the source is such a space? It turns out that it’s not quite so simple.

One thing, at least, is unchanged. We can still say that $f:\mathbb{R}^m\rightarrow\mathbb{R}^n$ is continuous at a point $a\in\mathbb{R}^m$ if $\lim\limits_{x\to a}f(x)=f(a)$. That is, if we have a sequence $\left\{a_i\right\}_{i=0}^n$ of points in $\mathbb{R}^m$ (we only need to consider sequences because metric spaces are sequential) that converges to $a$, then the image of this sequence $\left\{f(a_i)\right\}_{i=0}^n$ converges to $f(a)$.

The problem is that limits themselves in higher-dimensional real spaces become a little hairy. In $\mathbb{R}$ there’s really only two directions along which a sequence can converge to a given point. If we have a sequence converging from the right and another sequence converging from the left, that basically is enough to establish what the limit of the function is (and if it has one). In higher-dimensional spaces — even just in $\mathbb{R}^2$ — we have so many possible approaches to any given point that in order to avoid an infinite amount of work we have to use something like the formal definition of limits in terms of metric balls. That is

The function $f:\mathbb{R}^n\rightarrow\mathbb{R}$ has limit $L$ at the point $a$ if for every $\epsilon>0$ there is a $\delta>0$ so that $\delta>\lVert x-a\rVert>0$ implies $\lvert f(x)-L\rvert<\epsilon$.

We just consider the case with target $\mathbb{R}$ since higher-dimensional targets are just like multiple copies of this same definition, just as we saw for continuity.

Now, let’s look at a few examples of limits to get an idea for why it’s not so simple. In each case, we will be considering a function $f:\mathbb{R}^2\rightarrow\mathbb{R}$ which is bounded near $\left(0,0\right)$ (since just blowing up to infinity would be too easy to be really pathological) and even with nice limits along certain specified approaches, but which still fail to have a limit at the origin.

First off, let’s consider $\displaystyle f(x,y)=\frac{x^2-y^2}{x^2+y^2}$. If we consider approaching along the $x$-axis with the sequence $a_n=\left(\frac{1}{n},0\right)$ or $a_n=\left(-\frac{1}{n},0\right)$ we find a limit of ${1}$. However, if we approach along the $y$-axis with the sequence $a_n=\left(0,\frac{1}{n}\right)$ or $a_n=\left(0,-\frac{1}{n}\right)$ we instead find a limit of $-1$. Thus no limit exists for the function.

Next let’s try $\displaystyle f(x,y)=\frac{x^4-6x^2y^2+y^4}{x^4+2x^2y^2+y^4}$. Now the approaches along either axis above all give the limit ${1}$, so the limit of the function is ${1}$, right? Wrong! This time if we approach along the diagonal $y=x$ with the sequence $a_n=\left(\frac{1}{n},\frac{1}{n}\right)$ we get the limit $-1$. So we have to consider directions other than the coordinate axes.

What about $\displaystyle f(x,y)=\frac{x^2y}{x^4+y^2}$? Approaching along the coordinate axes we get a limit of ${0}$. Approaching along any diagonal $y=mx$ with the sequence $a_n=\left(\frac{1}{n},\frac{m}{n}\right)$ the calculations are a bit hairier but we still find a limit of ${0}$. So approaching from any direction we get the same limit, making the limit of the function ${0}$, right? Wrong again! Now if we approach along the parabola $y=x^2$ with the sequence $a_n=\left(\frac{1}{n},\frac{1}{n^2}\right)$ we find a limit of $\frac{1}{2}$, and so the limit still doesn’t exist. By this point it should be clear that if straight lines aren’t enough to simplify things then there are just far too many curves to consider, and we need some other method to establish a limit, which is where the metric ball definition comes in.

Now I want to go off on a little bit of a rant here. It’s become fashionable to not teach the metric ball definition — $\epsilon$$\delta$ proofs, as they’re often called — at the first semester calculus level. It’s not even on the Calculus AB exam. I’m not sure when this happened because I was taught them first thing when I took calculus, and it wasn’t that long between then and my first experience teaching calculus. But it’d have to have been sometime in the mid-’90s. Anyway, they don’t even teach it in most college courses anymore. And for the purposes of calculus that’s okay, since as I mentioned above you can easily get away without them when dealing with single-variable functions. They can even survive the analogues of $\epsilon$$\delta$ proofs that come up when dealing with convergent sequences in second-semester calculus.

The problem comes when students get to third semester calculus and multivariable functions. Now, as we’ve just seen, there’s no sure way of establishing a limit. We can in some cases establish the continuity of simple functions (like coordinate projections) and then use limit laws to build up a larger class. But this approach fails for functions superficially similar to the pathological functions listed above, but which do have limits which can be established by an $\epsilon$$\delta$ proof. We can establish that certain limits do not exist by techniques similar to those above, but this requires some ingenuity in choosing two appropriate paths which give different results. There are one or two other methods that work in special cases, but nothing works like an $\epsilon$$\delta$ proof.

But now we can’t teach $\epsilon$$\delta$ proofs to these students! The method is rather more complicated when we’ve got more than one variable to work with, not least because of the more complicated distance formula to work with. What used to happen was that students would have developed some facility with $\epsilon$$\delta$ proofs back in first and second semester calculus, which could then be brought to bear on this new situation. But now they have no background and cannot, in general, absorb both the logical details of challenge-response $\epsilon$$\delta$ proofs and the complications of multiple variables at the same time. And so we show them a few jury-rigged tricks and assure them that within the rest of the course they won’t have to worry about it. I’d almost rather dispense with limits entirely than present this Frankenstein’s monstrosity.

And yet, I see no sign that the tide will ever turn back. The only hope is that the movement to make statistics the capstone high-school course will gain momentum. If we can finally wrest first-semester calculus from the hands of the public school system and put all calculus students at a given college through the same three-semester track, then the more intellectually rigorous institutions might have the integrity to put proper limits back into the hands of their first semester students and not have to worry about incoming freshmen with high AP scores covering for shoddy backgrounds.

September 17, 2009

## Multivariable Continuity

Now that we have the topology of higher-dimensional real spaces in hand, we can discuss continuous functions between them. Since these are metric spaces we have our usual definition with $\epsilon$ and $\delta$ and all that:

A function $f:\mathbb{R}^m\rightarrow\mathbb{R}^n$ is continuous at $x$ if and only if for each $\epsilon>0$ there is a $\delta>0$ so that $\lVert y-x\rVert<\delta$ implies $\lVert f(y)-f(x)\rVert<\epsilon$.

where the bars denote the norm in one or the other of the spaces $\mathbb{R}^m$ or $\mathbb{R}^n$ as depends on context. Again, the idea is that if we pick a metric ball around $f(x)\in\mathbb{R}^n$, we can find some metric ball around $x\in\mathbb{R}^m$ whose image is contained in the first ball.

The reason why this works, of course, is that metric balls provide a neighborhood base for our topology. But remember that last time we came up with an equivalent topology on $\mathbb{R}^n$ using a very different subbase: preimages of neighborhoods in $\mathbb{R}$ under projections. Intersections of these pre-images furnish an alternative neighborhood base. Let’s see what happens if we write down the definition of continuity in these terms:

A function $f:\mathbb{R}^m\rightarrow\mathbb{R}^n$ is continuous at $x$ if and only if for each $\epsilon=\left(\epsilon_1,\dots,\epsilon_n\right)$ with all $\epsilon_i>0$ there is a $\delta>0$ so that $\lVert y-x\rVert<\delta$ implies $\lvert\pi_i(f(y))-\pi_i(f(x))\rvert<\epsilon_i$ for all $i$.

That is, if we pick a small enough metric ball around $x\in\mathbb{R}^m$ its image will fit within the “box” which extends in the $i$th direction a distance $\epsilon_i$ on each side from the point $f(x)$.

At first blush, this might be a different notion of continuity, but it really isn’t. From what we did last yesterday we know that both the boxes and the balls provide equivalent topologies on the space $\mathbb{R}^n$, and so they much give equivalent notions of continuity. In a standard multivariable calculus course, we essentially reconstruct this using handwaving about how if we can fit the image of a ball into any box we can choose a box that fits into a selected metric ball, and vice versa.

But why do we care about this equivalent statement? Because now I can define a bunch of functions $f_i=\pi_i\circ f$ so that $f_i(x)$ is the $i$th component of $f(x)$. For each of these real-valued functions, I have a definition of continuity:

A function $f:\mathbb{R}^m\rightarrow\mathbb{R}$ is continuous at $x$ if and only if for each $\epsilon>0$ there is a $\delta>0$ so that $\lVert y-x\rVert<\delta$ implies $\lvert f(y)-f(x)\rvert<\epsilon$.

So each $f_i$ is continuous if I can pick a $\delta_i$ that works with a given $\epsilon_i$. And if all the $f_i$ are continuous, I can pick the smallest of the $\delta_i$ and use it as a $\delta$ that works for each component. But then I can wrap the $\epsilon_i$ up into a vector $\epsilon=\left(\epsilon_1,\dots,\epsilon_n\right)$ and use the $\delta$ I’ve picked to satisfy the box definition of continuity for $f$ itself! Conversely, if $f$ is continuous by the box definition, then I must be able to use that the $\delta$ for a given vector $\epsilon$ to verify the continuity of each $f_i$ for the given $\epsilon_i$.

The upshot is that a function $f$ from a metric space (generalize this yourself to other metric spaces than $\mathbb{R}^m$) to $\mathbb{R}^n$ is continuous if and only if each of the component functions $f_i$ is continuous.

September 16, 2009

## The Topology of Higher-Dimensional Real Spaces

As we move towards multivariable calculus, we’re going to primarily be concerned with the topological spaces $\mathbb{R}^n$ (for various values of $n$) just as in calculus we were primarily concerned with the topological space $\mathbb{R}$. As a topological space, $\mathbb{R}^n$ is just like the vector space we’ve been discussing, but now we care a lot less about the algebraic structure than we do about the notion of which points are “close to” other points.

And it turns out that $\mathbb{R}^n$ is a metric space, so all of the special things we know about metric spaces can come into play. Indeed, inner products define norms and norms on vector spaces define metrics. We can even write it down explicitly. If we write our vectors $x=\left(x_1,\dots,x_n\right)$ and $y=\left(y_1,\dots,y_n\right)$, then the distance is

\displaystyle\begin{aligned}d(x,y)=\lVert y-x\rVert&=\\\sqrt{\langle y-x,y-x\rangle}&=\\\sqrt{\left(y_1-x_1\right)^2+\dots+\left(y_n-x_n\right)^2}&\end{aligned}

Incidentally, this is the exact same formula we’d get if we started with the metric space $\mathbb{R}$ and built up $\mathbb{R}^n$ as the product of $n$ copies.

One thing I didn’t mention back when I put together products of metric spaces is that we get the same topology as if we’d forgotten the metric and taken the product of topological spaces. This will actually be useful to us, in a way, so I’d like to explain it here.

We define the topology on a metric space by using balls of radius $\delta$ around each point to provide a subbase for the topology. On the other hand, when we have a product space we use preimages of open sets under the canonical projections to provide a subbase. To show that these generate the same topology, what we’ll do is show that the identity map from $\mathbb{R}^n$ as a product space to $\mathbb{R}^n$ as a metric space is a homeomorphism. Since it’s obviously invertible, we just need to show that it’s continuous in both directions. And we can use our subbases to do just that.

What we have to show is that each set in one subbase is open in terms of the other subbase. That is, for each point in the set we should be able to come up with a finite intersection of sets in the other subbase that contains the point, and yet fits inside the set we started with.

Okay, so consider the preimage of an open set $U\subseteq\mathbb{R}$ under the projection $\pi_i:\mathbb{R}^n\rightarrow\mathbb{R}$. That is, the collection of all $x=\left(x_1,\dots,x_n\right)$ with $x_i\in U$. Clearly since $U$ is open in the metric space $\mathbb{R}$ we can pick a radius $\delta$ so that the open interval $\left(x_i-\delta,x_i+\delta\right)$ is contained in $U$. But then the ball of radius $\delta$ in $\mathbb{R}^n$ around the point $x$ contains the point, and is itself contained in $\pi_i^{-1}(U)$, for if $|y_i-x_i|>\delta$ for some other point $y$, then $y$ cannot possibly be within the ball of radius $\delta$ around $x$.

On the other hand, let’s take a ball of radius $\delta$ about a point $x=\left(x_1,\dots,x_n\right)$. We set $\epsilon=\sqrt{\frac{\delta^2}{n}}$ and consider the open intervals $U_i=\left(x_i-\epsilon,x_i+\epsilon\right)$. I say that the intersection of the preimages $\pi_i^{-1}(U_i)$ is contained in the ball. Indeed, if $y=\left(y_1,\dots,y_n\right)$ is in the intersection, the furthest any coordinate can be from the center is $|y_i-x_i|<\epsilon$. Thus we can calculate the total distance

\displaystyle\begin{aligned}\lVert y-x\rVert&=\sqrt{\left(y_1-x_1\right)^2+\dots+\left(y_n-x_n\right)^2}\\&<\sqrt{\epsilon^2+\dots+\epsilon^2}\\&=\sqrt{\frac{\delta^2}{n}+\dots+\frac{\delta^2}{n}}\\&=\sqrt{\delta^2}=\delta\end{aligned}

and so the whole intersection must be within the ball.

This approach is pretty straightforward to generalize to the case of any product of metric spaces, but I’ll leave that as an exercise.

September 15, 2009

## Products of Metric Spaces

Shortly we’re going to need a construction that’s sort of interesting in its own right.

We know about products of topological spaces. We can take products of metric spaces, too, and one method comes down to us all the way from Pythagoras.

The famous Pythagorean theorem tells us that in a right triangle the length $c$ of the side opposite the right angle stands in a certain relation to the lengths $a$ and $b$ of the other two sides: $c^2=a^2+b^2$. So let’s say we’ve got metric spaces $(M_1,d_1)$ and $(M_2,d_2)$. For the moment we’ll think of them as being perpendicular and define a distance function $d$ on $M_1\times M_2$ by

$d((x_1,x_2),(y_1,y_2)=\sqrt{d_1(x_1,y_1)^2+d_2(x_2,y_2)^2}$

The quantity inside the radical here must be nonnegative, since it’s the sum of two nonnegative numbers. Since the result needs to be nonnegative, we take the unique nonnegative square root.

Oops, I don’t think I mentioned this before. Since the function $f(x)=x^2$ has $f'(x)=2x$ as its derivative, it’s always increasing where $x$ is positive. And since we can eventually a square above any real number we choose, its values run from zero all the way up to infinity. Now the same sort of argument as we used to construct the exponential function gives us an inverse sending any nonnegative number to a unique nonnegative square root.

Okay, that taken care of, we’ve got a distance function. It’s clearly nonnegative and symmetric. The only way for it to be zero is for the quantity in the radical to be zero, and this only happens if each of the terms $d_1(x_1,y_1)$ and $d_2(x_2,y_2)$ are zero. But since these are distance functions, that means $x_1=y_1$ and $x_2=y_2$, so $(x_1,x_2)=(y_1,y_2)$.

The last property we need is the triangle inequality. That is, for any three pairs $(x_1,x_2)$, $(y_1,y_2)$, $(z_1,z_2)$ we have the inequality

$d((x_1,x_2),(z_1,z_2))\leq d((x_1,x_2),(y_1,y_2))+d((y_1,y_2),(z_1,z_2))$

Substituting from the definition of $d$ we get the statement

$\sqrt{d_1(x_1,z_1)^2+d_2(x_2,z_2)^2}\leq\sqrt{d_1(x_1,y_1)^2+d_2(x_2,y_2)^2}+\sqrt{d_1(y_1,z_1)^2+d_2(y_2,z_2)^2}$

The triangle inequalities for $d_1$ and $d_2$ tell us that $d_1(x_1,z_1)\leq d_1(x_1,y_1)+d_1(y_1,z_1)$ and $d_2(x_2,z_2)\leq d_2(x_2,y_2)+d_1(y_2,z_2)$. So if we make these substitutions on the left, it increases the left side of the inequality we want. Thus if we can prove the stronger inequality

\begin{aligned}\sqrt{d_1(x_1,y_1)^2+2d_1(x_1,y_1)d_1(y_1,z_1)+d_1(y_1,z_1)^2+d_2(x_2,y_2)^2+2d_2(x_2,y_2)d_2(y_2,z_2)+d_2(y_2,z_2)^2}\\\leq\sqrt{d_1(x_1,y_1)^2+d_2(x_2,y_2)^2}+\sqrt{d_1(y_1,z_1)^2+d_2(y_2,z_2)^2}\end{aligned}

we’ll get the one we really want. Now since squaring preserves the order on the nonnegative reals, we can find this equivalent to

\begin{aligned}d_1(x_1,y_1)^2+2d_1(x_1,y_1)d_1(y_1,z_1)+d_1(y_1,z_1)^2+d_2(x_2,y_2)^2+2d_2(x_2,y_2)d_2(y_2,z_2)+d_2(y_2,z_2)^2\\\leq d_1(x_1,y_1)^2+d_2(x_2,y_2)^2+2\sqrt{d_1(x_1,y_1)^2+d_2(x_2,y_2)^2}\sqrt{d_1(y_1,z_1)^2+d_2(y_2,z_2)^2}+d_1(y_1,z_1)^2+d_2(y_2,z_2)^2\end{aligned}

Some cancellations later:

\begin{aligned}d_1(x_1,y_1)d_1(y_1,z_1)+d_2(x_2,y_2)d_2(y_2,z_2)\\\leq \sqrt{d_1(x_1,y_1)^2d_1(y_1,z_1)^2+d_1(x_1,y_1)^2d_2(y_2,z_2)^2+d_2(x_2,y_2)^2d_1(y_1,z_1)^2+d_2(x_2,y_2)^2d_2(y_2,z_2)^2}\end{aligned}

We square and cancel some more:

$2d_1(x_1,y_1)d_1(y_1,z_1)d_2(x_2,y_2)d_2(y_2,z_2)\leq d_1(x_1,y_1)^2d_2(y_2,z_2)^2+d_2(x_2,y_2)^2d_1(y_1,z_1)^2$

Moving these terms around we find

\begin{aligned}0\leq\left(d_1(x_1,y_1)d_2(y_2,z_2)\right)^2-2\left(d_1(x_1,y_1)d_2(y_2,z_2)\right)\left(d_2(x_2,y_2)d_1(y_1,z_1)\right)+\left(d_2(x_2,y_2)d_1(y_1,z_1)\right)^2\\=\left(d_1(x_1,y_1)d_2(y_2,z_2)-d_2(x_2,y_2)d_1(y_1,z_1)\right)^2\end{aligned}

So at the end of the day, our triangle inequality is equivalent to asking if a certain quantity squared is nonnegative, which it clearly is!

Now here’s the important thing at the end of all that calculation: this is just one way to get a metric on the product of two metric spaces. There are many other ones which give rise to different distance functions, but the same topology and the same uniform structure. And often it’s the topology that we’ll be most interested in.

In particular, this will give us a topology on any finite-dimensional vector space over the real numbers, but we don’t want to automatically equip that vector space with this norm unless we say so very explicitly. In fact, we don’t even want to make that same assumption about the two spaces being perpendicular to each other. The details of exactly why this is so I’ll leave until we get back to linear algebra, but I want to be clear right now that topology comes for free, but we may have good reason to use different “distances”.

August 19, 2008

## Metric Spaces are Categories!

A guest post by Tom Leinster over at The n-Category Café reminded me of an interesting fact I haven’t mentioned yet: a metric space is actually an example of an enriched category!

First we’ll need to pick out our base category $\mathcal{V}$, in which we’ll find our hom-objects. Consider the set of nonnegative real numbers with their real-number order, and add in a point called $\infty$ that’s above all the other points. This is a totally ordered set, and orders are categories. Let’s take the opposite of this category. That is, the objects of our category $V$ are the points in the “interval” $\left[0,\infty\right]$, and we have an arrow $x\rightarrow y$ exactly when $x\geq y$.

This turns out to be a monoidal category, and the monoidal structure is just addition. Clearly this gives a monoid on the set of objects, but we need to check it on morphisms to see it’s functorial. But if $x_1\geq y_1$ and $x_2\geq y_2$ then $x_1+x_2\geq y_1+y_2$, and so we can see addition as a functor.

So we’ve got a monoidal category, and we can now use it to form enriched categories. Let’s keep out lives simple by considering a small $\mathcal{V}$-category $\mathcal{C}$. Here’s how the definition looks.

We have a set of objects $\mathrm{Ob}(\mathcal{C})$ that we’ll call “points” in a set $X$. Between any two points $p_1$ and $p_2$ we need a hom-object $\hom_\mathcal{C}(p_1,p_2)\in\mathrm{Ob}(\mathcal{V})$. That is, we have a function $d:X\times X\rightarrow\left[0,\infty\right]$.

For a triple $(p_1,p_2,p_3)$ of objects we need an arrow $\hom_\mathcal{C}(p_2,p_3)\otimes\hom_\mathcal{C}(p_1,p_2)\rightarrow\hom_\mathcal{C}(p_1,p_3)$. In more quotidian terms, this means that $d(p_2,p_3)+d(p_1,p_2)\geq d(p_1,p_3)$.

Also, for each point $p$ there is an arrow from the identity object of $\mathcal{V}$ to the hom-object $\hom_\mathcal{C}(p,p)$. That is, $0\geq d(p,p)$, so $d(p,p)=0$.

These conditions are the first, fourth, and half of the second conditions in the definition of a metric space! In fact, there’s a weaker notion of a “pseudometric” space, wherein the second condition is simply that $d(p,p)=0$, and so we’re almost exactly giving the definition of a pseudometric space.

The only thing we’re missing is the requirement that $d(p_1,p_2)=d(p_2,p_1)$. The case can be made (and has been, by Lawvere) that this requirement is actually extraneous, and that it’s in some sense more natural to work with “asymmetric” (pseudo)metric spaces that are exactly those given by this enriched categorical framework.

February 11, 2008

## Some theorems about metric spaces

We need to get down a few facts about metric spaces before we can continue on our course. Firstly, as I alluded in an earlier comment, compact metric spaces are sequentially compact — every sequence has a convergent subsequence.

To see this fact, we’ll use the fact that compact spaces are the next best thing to finite. Specifically, in a finite set any infinite sequence would have to hit one point infinitely often. Here instead, we’ll have an accumulation point $\xi$ in our compact metric space $X$ so that for any $\epsilon>0$ and point $x_m$ in our sequence there is some $n\geq m$ with $d_X(x_n,\xi)<\epsilon$. That is, though the sequence may move away from $\xi$, it always comes back within $\epsilon$ of it again. Once we have an accumulation point $\xi$, we can find a subsequence converging to $\xi$ just as we found a subnet converging to any accumulation point of a net.

Let’s take our sequence and define $F_N=\mathrm{Cl}(\{x_n, n\geq N\})$ — the closure of the sequence from $x_N$ onwards. Then these closed sets are nested $F_1\supseteq F_2\supseteq...\supseteq F_N\supseteq...$, and the intersection of any finite number of them is the smallest one, which is clearly nonempty since it contains a tail of the sequence. Then by the compactness of $X$ we see that the intersection of all the $F_N$ is again nonempty. Since the points in this intersection are in the closure of any tail of the sequence, they must be accumulation points.

Okay, that doesn’t quite work. See the comments for more details. Michael asks where I use the fact that we’re in a metric space, which was very astute. It turns out on reflection that I did use it, but it was hidden.

We can still say we’re looking for an accumulation point first and foremost, because if the sequence has an accumulation point there must be some subsequence converging to that point. Why not a subnet in general? Because metric spaces must be normal Hausdorff (using metric neighborhoods to separate
closed sets) and first-countable! And as long as we’re first-countable (or, weaker, “sequential”) we can find a sequence converging to any limit point of a net.

What I didn’t say before is that once we find an accumulation point there will be a subsequence converging to that point. My counterexample is compact, and any sequence in it has accumulation points, but we will only be able to find subnets of our sequence converging to them, not subsequences. Unless we add something to assure that our space is sequential, and metric spaces do that.

We should note in passing that the special case where $X$ is a compact subspace of $\mathbb{R}^n$ is referred to as the Bolzano-Weierstrass Theorem.

Next is the Heine-Cantor theorem, which says that any continuous function $f:M\rightarrow N$ from a compact metric space $M$ to any metric space $N$ is uniformly continuous. In particular, we can use the interval $\left[a,b\right]$ as our compact metric space $M$ and the real numbers $\mathbb{R}$ as our metric space $N$ to see that any continuous function on a closed interval is uniformly continuous.

So let’s assume that $f$ is continuous but not uniformly continuous. Then there is some $\epsilon>0$ so that for any $\delta>0$ there are points $x$ and $y$ in $M$ with $d_M(x,y)<\delta$ but $d_N(f(x),f(y))\geq\epsilon$. In particular, we can pick $\frac{1}{n}$ as our $\delta$ and get two sequences $x_n$ and $y_n$ with $d_M(x_n,y_n)<\frac{1}{n}$ but $d_N(f(x),f(y))\geq\epsilon$. By the above theorem we can find subsequences $x_{n_k}$ converging to $\bar{x}$ and $y_{n_k}$ converging to $\bar{y}$.

Now $d_X(x_{n_k},y_{n_k})<\frac{1}{n_k}$, which converges to ${0}$, and so $\bar{x}=\bar{y}$. Therefore we must have $d_Y(f(x_{n_k}),f(y_{n_k})$ also converging to ${0}$ by the continuity of $f$. But this can’t happen, since each of these distances must be at least $\epsilon$! Thus $f$ must have been uniformly continuous to begin with.

January 31, 2008

## The Heine-Borel Theorem

We’ve talked about compact subspaces, particularly of compact spaces and Hausdorff spaces (and, of course, compact Hausdorff spaces). So how can we use this to understand the space $\mathbb{R}$ of real numbers, or higher-dimensional versions like $\mathbb{R}^n$?

First off, $\mathbb{R}$ is Hausdorff, which should be straightforward to prove. Unfortunately, it’s not compact. To see this, consider the open sets of the form $(-x,x)$ for all positive real numbers $x$. Given any real number $y$ we can find an $x$ with $|y|, so $y\in(-x,x)$. Therefore the collection of these open intervals covers $\mathbb{R}$. But if we take any finite number of them, one will be the biggest, and so we must miss some real numbers. This open cover does not have a finite subcover, and $\mathbb{R}$ is not compact. We can similarly show that $\mathbb{R}^n$ is Hausdorff, but not compact.

So, since $\mathbb{R}^n$ is Hausdorff, any compact subset of $\mathbb{R}^n$ must be closed. But not every closed subset is compact. What else does compactness imply? Well, we can take the proof that $\mathbb{R}^n$ isn’t compact and adapt it to any subset $A\subseteq\mathbb{R}^n$. We take the collection of all open “cubes” $(-x,x)^n$ consisting of $n$-tuples of real numbers, each of which is between $-x$ and $x$, and we form open subsets of $A$ by the intersections $U_x=(-x,x)^n\cap A$. Now the only way for there to be a finite subcover of this open cover of $A$ is for there to be some $x$ so that $U_x=A$. That is, every component of every point of $A$ has absolute value less than $x$, and so we say that $A$ is “bounded”.

We see now that every compact subset of $\mathbb{R}^n$ is closed and bounded. It turns out that being closed and bounded is not only necessary for compactness, but they’re also sufficient! To see this, we’ll show that the closed cube $\left[-x,x\right]^n$ is compact. Then a bounded set $A$ is contained in some such cube, and a closed subset of a compact space is compact. This is the Heine-Borel theorem.

In the $n=1$ case, we just need to see that the interval $\left[-x,x\right]$ is compact. Take an open cover $\{U_i\}$ of this interval, and define the set $S$ to consist of all $y\in\left[-x,x\right]$ so that a finite collection of the $U_i$ cover $\left[-x,y\right]$. Then define $t$ to be the least upper bound of $S$. Basically, $t$ is as far along the interval as we can get with a finite number of sets, and we’re hoping to show that $t=x$. Clearly it can’t go past $x$, since $S\subseteq\left[-x,x\right]$. But can it be less than $x$?

In fact it can’t, because if it were, then we can find some open set $U$ from the cover that contains $t$. As an open neighborhood of $t$, the set $U$ contains some interval $(t-\epsilon,t+\epsilon)$. Then $t-\epsilon$ must be in $S$, and so there is some finite collection of the $U_i$ which covers $\left[-x,t-\epsilon\right]$. But then we can just add in $U$ to get a finite collection of the $U_i$ which covers $\left[-x,t+\frac{\epsilon}{2}\right]$, and this contradicts the fact that $t$ is the supremum of $S$. Thus $t=x$ and there is a finite subcover of $\left[-x,x\right]$, making this closed interval compact!

Now Tychonoff’s Theorem tells us that products of closed intervals are also compact. In particular, the closed cube $\left[-x,x\right]^n\subseteq\mathbb{R}^n$ is compact. And since any closed and bounded set is contained in some such cube, it will be compact as a closed subspace of a compact space. Incidentally, since $n$ is finite, we don’t need to wave the Zorn talisman to get this invocation of the Tychonoff magic to work.

As a special case, we can look back at the one-dimensional case to see that a compact, connected space must be a closed interval $\left[a,b\right]$. Then we know that the image of a connected space is connected, and that the image of a compact space is compact, so the image of a closed interval under a continuous function $f:\mathbb{R}\rightarrow\mathbb{R}$ is another closed interval.

The fact that this image is an interval gave us the intermediate value theorem. The fact that it’s closed now gives us the extreme value theorem: a continuous, real-valued function $f$ on a closed interval $\left[a,b\right]$ attains a maximum and a minimum. That is, there is some $c\in\left[a,b\right]$ so that $f(c)\geq f(x)$ for all $x\in\left[a,b\right]$, and similarly there is some $d\in\left[a,b\right]$ so that $f(d)\leq f(x)$ for all $x\in\left[a,b\right]$.

January 18, 2008

## Tychonoff’s Theorem

One of the biggest results in point-set topology is Tychonoff’s Theorem: the fact that the product of any family $\{X_i\}_{i\in\mathcal{I}}$ of compact spaces is again compact. Unsurprisingly, the really tough bit comes in when we look at an infinite product. Our approach will use the dual definition of compactness.

Let’s say that a collection $\mathcal{F}$ of closed sets has the finite intersection hypothesis if all finite intersections of members of the collection are nonempty, so compactness says that any collection satisfying the finite intersection hypothesis has nonempty intersection. We can then form the collection $\Omega=\{\mathcal{F}\}$ of all collections of sets satisfying the finite intersection hypothesis. This can be partially ordered by containment — $\mathcal{F}'\leq\mathcal{F}$ if every set in $\mathcal{F}'$ is also in $\mathcal{F}$.

Given any particular collection $\mathcal{F}$ we can find a maximal collection containing it by finding the longest increasing chain in $\Omega$ starting at $\mathcal{F}$. Then we simply take the union of all these collections to find the collection at its top. This is almost exactly the same thing as we did back when we showed that every vector space is a free module! And just like then, we need Zorn’s lemma to tell us that we can manage the trick in general, but if we look closely at how we’re going to use it we’ll see that we can get away without Zorn’s lemma for finite products.

Anyhow, this maximal collection $\mathcal{F}$ has two nice properties: it contains all of its own finite intersections, and it contains any set which intersects each set in $\mathcal{F}$. These are both true because if $\mathcal{F}$ didn’t contain one of these sets we could throw it in, make $\mathcal{F}$ strictly larger, and still satisfy the finite intersection hypothesis.

Now let’s assume that $\mathcal{F}$ is a collection of closed subsets of $\prod\limits_{i\in\mathcal{I}}X_i$ satisfying the finite intersection hypothesis. We can then get a maximal collection $\mathcal{G}$ containing $\mathcal{F}$. Then given an index $i\in\mathcal{I}$ we can consider the collection $\{\overline{\pi_i(G)}\}_{G\in\mathcal{G}}$ of closed subsets of $X_i$ and see that it, too, satisfies the finite intersection hypothesis. Thus by compactness of $X_i$ the intersection of this collection is nonempty. Letting $U_i$ be a closed set containing one of these intersection points $x_i$, we see that the preimage $\pi_i^{-1}(U_i)$ meets every $G\in\mathcal{G}$, and so must itself be in $\mathcal{G}$.

Okay, so let’s take the point $x_i$ for each index and consider the point $p$ in $\prod\limits_{i\in\mathcal{I}}X_i$ with $i$-th coordinate $x_i$. Then pick some set $V=\prod\limits_{i\in\mathcal{I}}V_i$ containing $p$ from the base for the product topology. For all but a finite number of the $i$, $V_i=X_i$. For those finite number where it’s smaller, the closure of $V_i$ contains the point $x_i\in X_i$, and so $\pi_i^{-1}(V_i)$ is in $\mathcal{G}$. So their finite intersection must be nonempty, and so is $V$ itself!

Now, since $V$ is in $\mathcal{G}$, it must intersect each of the closed sets in the original collection $\mathcal{F}$. Since the only constraint on $V$ is that it contain $p$, this point must be a limit point of each of the sets in $\mathcal{F}$. And because they’re closed, they must contain all of their limit points. Thus the intersection of all the sets in $\mathcal{F}$ is nonempty, and the product space is compact!

January 17, 2008

## The Image of a Compact Space

One of the nice things about connectedness is that it’s preserved under continuous maps. It turns out that compactness is the same way — the image of a compact space $X$ under a continuous map $f:X\rightarrow Y$ is compact.

Let’s take an open cover $\{U_i\}$ of the image $f(X)$. Since $f$ is continuous, we can take the preimage of each of these open sets $\{f^{-1}(U_i)\}$ to get a bunch of open sets in $X$. Clearly every point of $X$ is the preimage of some point of $f(X)$, so the $f^{-1}(U_i)$ form an open cover of $X$. Then we can take a finite subcover by compactness of $X$, picking out some finite collection of indices. Then looking back at the $U_i$ corresponding to these indices (instead of their preimages) we get a finite subcover of $f(X)$. Thus any open cover of the image has a finite subcover, and the image is compact.

January 16, 2008