## Egoroff’s Theorem

Let’s look back at what goes wrong when a sequence of functions doesn’t converge uniformly. Let be the closed unit interval , and let . Pointwise, this converges to a function with for , and . This convergence can’t be uniform, because the uniform limit of a sequence of continuous functions is continuous.

But things only go wrong at the one point, and the singleton has measure zero. That is, the sequence converges almost everywhere to the function with constant value . The convergence still isn’t uniform, though, because we still have a problem at . But if we cut out any open patch and only look at the interval , the convergence *is* uniform. We might think that this is “uniform a.e.”, but we have to cut out a set of positive measure to make it work. The set can be as small as we want, but we can’t get uniformity by just cutting out .

However, what we’ve seen is a general phenomenon expressed in Egoroff’s Theorem: If is a measurable set of finite measure, and if is a sequence of a.e. finite-valued measurable functions converging a.e. on to a finite-valued measurable function , then for every there is a measurable subset with so that converges uniformly to on . That is, if we have a.e. convergence we can get to uniform convergence by cutting out an arbitrarily small part of our domain.

First off, we cut out a set of measure zero from so that converges pointwise to . Now we define the measurable sets

As gets bigger, we’re taking the intersection of fewer and fewer sets, and so . Since converges pointwise to , eventually the difference gets down below every , and so for every . Thus we conclude that . And so for every there is an so that

Now let’s define

This is a measurable set contained in , and monotonicity tells us that

We can calculate

And so given any we take . Then for any we have , and thus . Since we can pick this independently of , the convergence on is uniform.

## Convergence Almost Everywhere

Okay, so let’s take our idea of almost everywhere and apply it to convergence of sequences of measurable functions.

Given a sequence of extended real-valued functions on a measure space , we say that converges a.e. to the function if there is a set with so that for all . Similarly, we say that the sequence is Cauchy a.e. if there exists a set of measure zero so that is a Cauchy sequence of real numbers for all . That is, given and there is some natural number depending on and so that whenever we have

Because the real numbers form a complete metric space, being Cauchy and being convergent are equivalent — a sequence of finite real numbers is convergent if and only if it is Cauchy, and a similar thing happens here. If a sequence of finite-valued functions is convergent a.e., then converges to away from a set of measure zero. Each of these sequences is thus Cauchy, and so is Cauchy almost everywhere. On the other hand, if is Cauchy a.e. then the sequences are Cauchy away from a set of measure zero, and these sequences then converge.

We can also define what it means for a sequence of functions to converge uniformly almost everywhere. That is, there is some set of measure zero so that for every we can find a natural number so that for all and we have . The uniformity means that is independent of , but if we choose a different negligible we may have to choose different values of to get the desired control on the sequence.

As it happens, the topology defined by uniform a.e. convergence comes from a norm: the essential supremum; using this notion of convergence makes the algebra of essentially bounded measurable functions on a measure space into a normed vector space. Indeed, we can check what it means for a sequence of functions to converge to under the essential supremum norm — for any there is some so that for all we have . Unpacking the definition of the essential supremum, this means that there is some measurable set with measure zero so that for all , which is exactly what we said for uniform a.e. convergence above.

We can also turn around and define what it means for a sequence to be uniformly Cauchy almost everywhere — for any there is some so that for all we have . Unpacking again, there is some measurable set so that for all . It’s straightforward to check that a sequence that converges uniformly a.e. is uniformly Cauchy a.e., and vice versa. That is, the topology defined by the essential supremum norm is complete, and the algebra of essentially bounded measurable functions on a measure space is a Banach space.

## Almost Everywhere

Now we come to one of the most common terms of art in analysis: “almost everywhere”. It’s unusual in that it sounds perfectly colloquial, and yet it has a very technical meaning.

The roots of “almost everywhere” are in the notion of a negligible set. If we’re working with a measure space we don’t really care about subsets of sets of measure zero, and anything that happens only on such a negligible set we try to sweep under the rug. For example, let’s say we have a function defined by for all , and by . Colloquially, we say that is zero “almost everywhere” because the set where it isn’t zero — the singleton — has measure zero.

In general, if we have some property that can be applied to points , then we say is true “almost everywhere” if the set where is false is negligible. That is, if we can find some measurable set with so that is true for all . Note that we don’t particularly care if the set where is false is itself measurable, although if is complete then all -negligible sets will be measurable. This sort of language is so common in measure theory and analysis that it’s often abbreviated as “a.e.”. Older texts will say “*p.p.*” for the French equivalent “*presque partout*“. In probability theory (measure theory’s cousin) we run into “a.s.” for “almost surely”.

No matter how we say or write it, “almost everywhere” has a hidden dependence on some measure. In many cases, the measure is obvious from context, in that there’s only one measure under consideration on a given space. However, in the case where we have two measures and on the same measurable space, we may distinguish them by writing “-almost everywhere” and “-almost everywhere” (or “-a.e.” and “-a.e.”), or by explicitly stating with respect to which measure we mean.

We’ve actually seen this sort of thing in the wild before; Lebesgue’s condition can be reformulated to say that a bounded function defined on an -dimensional interval is Riemann integrable on that interval if and only if is continuous almost everywhere (with respect to Lebesgue measure).

As more of a new example, we say that a function is “essentially bounded” if it is bounded almost everywhere. That is, if there is a constant and some measurable set with so that for all . We’re willing to accept *some* points exceeding , but no more than a set of measure zero. The infimum of all such essential bounds is the “essential supremum” of , written .

## Topological Vector Spaces, Normed Vector Spaces, and Banach Spaces

Before we move on, we want to define some structures that blend algebraic and topological notions. These are all based on vector spaces. And, particularly, we care about *infinite-dimensional* vector spaces. Finite-dimensional vector spaces are actually pretty simple, topologically. For pretty much all purposes you have a topology on your base field , and the vector space (which is isomorphic to for some ) will get the product topology.

But for infinite-dimensional spaces the product topology is often not going to be particularly useful. For example, the space of functions is a product; we write to mean the product of one copy of for each point in . Limits in this topology are “pointwise” limits of functions, but this isn’t always the most useful way to think about limits of functions. The sequence

converges pointwise to a function for and . But we will find it useful to be able to ignore this behavior at the one isolated point and say that . It’s this connection with spaces of functions that brings such infinite-dimensional topological vector spaces into the realm of “functional analysis”.

Okay, so to get a topological vector space, we take a vector space and put a (surprise!) topology on it. But not just any topology will do: Remember that every point in a vector space looks pretty much like every other one. The transformation has an inverse , and it only makes sense that these be homeomorphisms. And to capture this, we put a uniform structure on our space. That is, we specify what the neighborhoods are of , and just translate them around to all the other points.

Now, a common way to come up with such a uniform structure is to define a norm on our vector space. That is, to define a function satisfying the three axioms

- For all vectors and scalars , we have .
- For all vectors and , we have .
- The norm is zero if and only if the vector is the zero vector.

Notice that we need to be working over a field in which we have a notion of absolute value, so we can measure the size of scalars. We might also want to do away with the last condition and use a “seminorm”. In any event, it’s important to note that though our earlier examples of norms all came from inner products we *do not need an inner product to have a norm*. In fact, there exist norms that come from no inner product at all.

So if we define a norm we get a “normed vector space”. This is a metric space, with a metric function defined by . This is nice because metric spaces are first-countable, and thus sequential. That is, we can define the topology of a (semi-)normed vector space by defining exactly what it means for a sequence of vectors to converge, and in particular what it means for them to converge to zero.

Finally, if we’ve got a normed vector space, it’s a natural question to ask whether or not this vector space is complete or not. That is, we have all the pieces in place to define Cauchy sequences in our vector space, and we would like for all of these sequences to converge under our uniform structure. If this happens — if we have a complete normed vector space — we call our structure a “Banach space”. Most of the spaces we’re concerned with in functional analysis are Banach spaces.

Again, for finite-dimensional vector spaces (at least over or ) this is all pretty easy; we can always define an inner product, and this gives us a norm. If our underlying topological field is complete, then the vector space will be as well. Even without considering a norm, convergence of sequences is just given component-by-component. But infinite-dimensional vector spaces get hairier. Since our algebraic operations only give us finite sums, we have to take some sorts of limits to even talk about most vectors in the space in the first place, and taking limits of such vectors could just complicate things further. Studying these interesting topologies and seeing how linear algebra — the study of vector spaces and linear transformations — behaves in the infinite-dimensional context is the taproot of functional analysis.

## Simple and Elementary Functions

We now introduce two classes of functions that are very easy to work with. As usual, we’re working in some measurable space .

First, we have the “simple functions”. Such a function is described by picking a finite number of pairwise disjoint measurable sets and a corresponding set of *finite* real numbers . We use these to define a function by declaring if , and if is in none of the . The very simplet example is the characteristic function of a measurable function . Any other simple function can be written as

Any simple function is measurable, for the preimage is the union of all the corresponding to those , and is thus measurable.

It’s straightforward to verify that the product and sum of any two simple functions is itself a simple function — given functions and , we have and . It’s even easier to see that any scalar multiple of a simple function is simple — . And thus the collection of simple functions forms a subalgebra of the algebra of measurable functions.

“Elementary functions” are similar to simple functions. We slightly relax the conditions by allowing a countably infinite number of measurable sets and corresponding values .

Now, why do we care about simple functions? As it happens, *every* measurable function can be approximated by simple functions! That is, given any measurable function we can find a sequence of *simple* functions converging pointwise to .

To see this, first break up into its positive and negative parts and . If we can approximate any nonnegative measurable function by a pointwise-increasing sequence of nonnegative simple functions, then we can approximate each of and , and the difference of these series approximates . So, without loss of generality, we will assume that is nonnegative.

Okay, so here’s how we’ll define the simple functions :

That is, to define we chop up the nonnegative real numbers into chunks of width , and within each of these slices we round values of down to the lower endpoint. If , we round all the way down to . There can only ever be values for , and each of these corresponds to a measurable set. The value corresponds to the set

while the value corresponds to the set . And thus is indeed a simple function.

So, does the sequence converge pointwise to ? Well, if , then for all . On the other hand, if then ; after this point, and are both within a slice of width , and so . And so given a large enough we can bring within any desired bound of . Thus the sequence increases pointwise to the function .

But that’s not all! If is bounded above by some integer , the sequence converges uniformly to . Indeed, once we get to , we cannot have for any . That is, for sufficiently large we *always* have . Given an we pick an so that both and , and this will guarantee for *every* . That is: the convergence is uniform.

This is also where elementary functions come in handy. If we’re allowed to use a countably infinite number of values, we can get uniform convergence without having to ask that be bounded. Indeed, instead of defining for , just chop up *all* positive values into slices of width . There are only a countably infinite number of such slices, and so the resulting function is elementary, if not quite simple.

## Sequences of Measurable Functions

We let be a sequence of extended real-valued measurable functions on a measurable space , and ask what we can say about limits of this sequence.

First of all, the function is measurable. The preimage is the union of the countable collection , while the preimage is the intersection of the countable collection . And so both of these sets are measurable, and we can restrict to the case of finite-valued functions.

So now let’s use our convenient condition. Given a real number we know that if and only if for some . That is, we can write

Each term on the right is measurable since each is a measurable function, and so the set on the left is measurable. Thus we conclude that is measurable as well.

Similarly, we find that the function is measurable.

Now the functions

are also measurable. Indeed, in proving that is measurable we can use the exact same technique as above to prove that the inner supremum is measurable; it doesn’t really depend on the supremum starting at or higher. And then the outer infimum is exactly as before. Proving is measurable is similar.

Now we can talk about pointwise convergence of a sequence of measurable functions. That is, for a fixed point we have the sequence which has some limit superior and some limit inferior . If these two coincide, then the sequence has a proper limit . But one of our lemmas tells us that the set of points where any two measurable functions coincide has a nice property: has a measurable intersection with every measurable set. And thus if we define the function on this subspace of for which the limit exists, the resulting function is measurable.

## Positive and Negative Parts of Functions

Now that we have sums and products to work with, we find that the maximum of and — sometimes written or — and their minimum — sometimes written — are measurable. Indeed, we can write

and we know that absolute values of functions are measurable.

As special cases of this construction we define the “positive part” and “negative part” of an extended real-valued function as

The positive part is obviously just what we get if we lop off any part of that extends below . The negative part is a little more subtle. First we lop off everything above , but then we take the negative of this function. As a result, and are both nonnegative functions. And if is measurable, then so are and . We can thus write any measurable function as the difference of two nonnegative measurable functions

Conversely, any function with measurable positive and negative parts is itself measurable.

This is sort of like how we found that functions of bounded variation can be written as the difference between two strictly increasing functions. In fact, if we’re loose about what we mean by “function”, and “derivative”, we could even see this fact as a decomposition of the derivative of a function of bounded variation into its positive and negative parts.

It will thus be useful to restrict attention to nonnegative measurable functions instead of general measurable functions. Many statements can be more easily proven for nonnegative measurable functions, and the results will be preserved when we take the difference of two functions. Since we can write any measurable function as the difference between two nonnegative ones, this will suffice.

It will also be sometimes useful to realize that we may write the absolute value of a function as

## Adding and Multiplying Measurable Real-Valued Functions

One approach to the problem of adding and multiplying measurable functions on a measurable space would be to define a two-dimensional version of Borel sets and Lebesgue measure, and to tweak the definition of a measurable function to this space like we did before to treat the additive identity specially. Then we could set up products (which we will eventually do) and get a map and compose this with the Borel map or the Borel map . In fact, if you’re up for it, you can go ahead and try working out this approach as an exercise.

Instead, we’ll take more of a low road towards showing that the sum and product of two measurable functions are measurable. We start with a useful lemma: if and are extended real-valued measurable functions on a measurable space and if is any real number, then each of the sets

has a measurable intersection with every measurable set. If is itself measurable, of course, this just means that these three sets are measurable.

To see this for the set , consider the (countable) set of rational numbers. If really is strictly less than , then there must be some rational number between them. That is, if then for some we have and . And thus we can write as the countable union

By the measurability of and , this is the countable union of a collection of measurable sets, and is thus measurable.

We can write as , and so the assertion for follows from that for . And we can write , so the statement is true for that set as well.

Anyway, now we can verify that the sum and product of two measurable extended real-valued functions are measurable as well. We first handle infinite values separately. For the product, if and only if . Since the sets and are both measurable, the set — their union — is measurable. We can handle , , and similarly.

So now we turn to our convenient condition for measurability. Since we’ve handled the sets where and are infinite, we can assume that they’re finite. Given a real number , we find

which is measurable by our lemma above (with in place of ). Since this is true for every real number , the sum is measurable.

To verify our assertion for the product , we turn and recall the polarization identities from when we worked with inner products. Remember, they told us that if we know how to calculate squares, we can calculate products. Something similar is true now, as we write

We just found that the sum and the difference are measurable. And any positive integral power of a measurable function is measurable, so the squares of the sum and difference functions are measurable. And then the product is a scalar multiple of the difference of these squares, and is thus measurable.

## Composing Real-Valued Measurable Functions II

As promised, today we come up with an example of a measurable function and a Lebesgue measurable function so that the composition is not measurable. Specifically, will be the closed unit interval , considered as a measurable subspace of .

Now, every point can be written out in ternary as

We set (depending on ) to be the first index for which , and if no such index exists. Then we define the function

That is, write out the number in ternary until you hit a , and leave off everything after that. Change all the s to s, and consider the resulting string of s and s as a number written out in binary. The extra fraction added in the formula above comes from that first . This function is often called the “Cantor function” because of its relationship to the famous Cantor set. In case it’s not apparent, the Cantor set is the collection of points with no .

First of all, is increasing from to . Clearly , so ; and so . Given points and , if then for and . If and or , then as we write out in binary the th bit is , while the th bit of is and so . On the other hand, if and , then the th bit of both and is , but stops at that point while has at least one more bit equal to . And so again .

Maybe more surprising is the fact that is actually continuous! If again we have and and for , then we find

Thus, given an we can find a large enough so that . Then we can pick a small enough so that two numbers differing by less than will agree to the first places in their ternary expansions, and so is continuous.

Unfortunately, might not be *strictly* increasing. Indeed, on any stretch of , the function is actually constant! It’s interesting to note that manages to increase continuously from to while remaining constant almost everywhere. But still we’re going to need a *strictly* increasing function for our purposes. We get this by considering . This still increases continuously from to , but now it’s strictly increasing.

But as a strictly increasing continuous function from to itself, it has a strictly increasing continuous inverse. That is, there is a strictly increasing continuous function such that if and only if . And since it’s continuous, it’s Borel measurable, and any Borel measurable function is Lebesgue measurable.

Now, the set is Lebesgue measurable and has positive measure. This is the collection of points of the form for . To get at this, first we consider the collection . It’s pretty straightforward to see that this consists of all terminating binary expansions, which are exactly the rational numbers. But this is a countable set, and countable sets have Lebesgue measure zero. Consequently, we find that . Since , there must be some positive measure in in order to make up the difference.

But now we can take a thick, non-Lebesgue measurable set whose intersection with is itself a non-Lebesgue measurable set . However, , and has Lebesgue measure zero. Since every subset of a set of Lebesgue measure zero is itself Lebesgue measurable (by completeness), must be Lebesgue measurable, even though is not. This is not a problem because we only ever asked that the preimage under of any *Borel* set be Lebesgue measurable.

At last, we set — the characteristic function of this set . This function is Lebesgue measurable, because the preimage of any set is one of , , or , all of which are Lebesgue measurable. And we’ve already established that is measurable. However, the composition is *not* measurable, since the preimage of the Borel set is

which is not Lebesgue measurable.

## Composing Real-Valued Measurable Functions I

Now that we’ve tweaked our definition of a measurable real-valued function, we may have broken composability. We didn’t even say much about it when we defined the category of measurable spaces, because for most purposes it’s just like in topological spaces: given measurable functions and and a measurable set , the measurability of tells us that , and the measurability of tells us that .

But now we’re treating a bit differently, and so we have to be careful. I say that if is a Borel measurable extended-real-valued function on the extended real line so that , and if is a measurable extended-real-valued function on a measurable space , then the composition is measurable. Indeed, if is any Borel set, then we find

Since , we can write

And since is Borel measurable we know that is a Borel set. We can thus continue our calculation from above

which is measurable by the measurability of

This is a sufficient, but far from a necessary condition. But it does allow us to bring in various useful functions in the place of . For any positive real number we have the function . If is a positive integer, we have the function . These are all continuous, which implies that they’re Borel measurable, and they send back to itself. We conclude that any positive integral power of a measurable function is measurable, as is any positive power of the absolute value of .

Of course, if itself is measurable as a subset of itself, then we need not tweak to our definition and we don’t need to add the requirement that . Also, the converse of this theorem is definitely not true; if is a non-measurable set, then the function is not measurable even though the absolute value is measurable.

It’s important to note here that we’re asking that be *Borel* measurable, because our definition of a measurable real-valued function is in terms of Borel sets in the target. Indeed, writing things out more thoroughly helps us see this: if and are measurable, then we can compose the functions on the underlying sets, but the target of isn’t the same measurable space as the source of . There is thus no reason to believe that the composite would be measurable. And tomorrow I’ll give an example of just such a case.