Egoroff’s Theorem
Let’s look back at what goes wrong when a sequence of functions doesn’t converge uniformly. Let be the closed unit interval
, and let
. Pointwise, this converges to a function
with
for
, and
. This convergence can’t be uniform, because the uniform limit of a sequence of continuous functions is continuous.
But things only go wrong at the one point, and the singleton has measure zero. That is, the sequence
converges almost everywhere to the function with constant value
. The convergence still isn’t uniform, though, because we still have a problem at
. But if we cut out any open patch and only look at the interval
, the convergence is uniform. We might think that this is “uniform a.e.”, but we have to cut out a set of positive measure to make it work. The set can be as small as we want, but we can’t get uniformity by just cutting out
.
However, what we’ve seen is a general phenomenon expressed in Egoroff’s Theorem: If is a measurable set of finite measure, and if
is a sequence of a.e. finite-valued measurable functions converging a.e. on
to a finite-valued measurable function
, then for every
there is a measurable subset
with
so that
converges uniformly to
on
. That is, if we have a.e. convergence we can get to uniform convergence by cutting out an arbitrarily small part of our domain.
First off, we cut out a set of measure zero from so that
converges pointwise to
. Now we define the measurable sets
As gets bigger, we’re taking the intersection of fewer and fewer sets, and so
. Since
converges pointwise to
, eventually the difference
gets down below every
, and so
for every
. Thus we conclude that
. And so for every
there is an
so that
Now let’s define
This is a measurable set contained in , and monotonicity tells us that
We can calculate
And so given any we take
. Then for any
we have
, and thus
. Since we can pick this
independently of
, the convergence on
is uniform.
Convergence Almost Everywhere
Okay, so let’s take our idea of almost everywhere and apply it to convergence of sequences of measurable functions.
Given a sequence of extended real-valued functions on a measure space
, we say that
converges a.e. to the function
if there is a set
with
so that
for all
. Similarly, we say that the sequence
is Cauchy a.e. if there exists a set
of measure zero so that
is a Cauchy sequence of real numbers for all
. That is, given
and
there is some natural number
depending on
and
so that whenever
we have
Because the real numbers form a complete metric space, being Cauchy and being convergent are equivalent — a sequence of finite real numbers is convergent if and only if it is Cauchy, and a similar thing happens here. If a sequence of finite-valued functions is convergent a.e., then
converges to
away from a set of measure zero. Each of these sequences
is thus Cauchy, and so
is Cauchy almost everywhere. On the other hand, if
is Cauchy a.e. then the sequences
are Cauchy away from a set of measure zero, and these sequences then converge.
We can also define what it means for a sequence of functions to converge uniformly almost everywhere. That is, there is some set of measure zero so that for every
we can find a natural number
so that for all
and
we have
. The uniformity means that
is independent of
, but if we choose a different negligible
we may have to choose different values of
to get the desired control on the sequence.
As it happens, the topology defined by uniform a.e. convergence comes from a norm: the essential supremum; using this notion of convergence makes the algebra of essentially bounded measurable functions on a measure space into a normed vector space. Indeed, we can check what it means for a sequence of functions
to converge to
under the essential supremum norm — for any
there is some
so that for all
we have
. Unpacking the definition of the essential supremum, this means that there is some measurable set
with measure zero so that
for all
, which is exactly what we said for uniform a.e. convergence above.
We can also turn around and define what it means for a sequence to be uniformly Cauchy almost everywhere — for any there is some
so that for all
we have
. Unpacking again, there is some measurable set
so that
for all
. It’s straightforward to check that a sequence that converges uniformly a.e. is uniformly Cauchy a.e., and vice versa. That is, the topology defined by the essential supremum norm is complete, and the algebra of essentially bounded measurable functions on a measure space
is a Banach space.
Almost Everywhere
Now we come to one of the most common terms of art in analysis: “almost everywhere”. It’s unusual in that it sounds perfectly colloquial, and yet it has a very technical meaning.
The roots of “almost everywhere” are in the notion of a negligible set. If we’re working with a measure space we don’t really care about subsets of sets of measure zero, and anything that happens only on such a negligible set we try to sweep under the rug. For example, let’s say we have a function defined by
for all
, and by
. Colloquially, we say that
is zero “almost everywhere” because the set where it isn’t zero — the singleton
— has measure zero.
In general, if we have some property that can be applied to points
, then we say
is true “almost everywhere” if the set where
is false is negligible. That is, if we can find some measurable set
with
so that
is true for all
. Note that we don’t particularly care if the set where
is false is itself measurable, although if
is complete then all
-negligible sets will be measurable. This sort of language is so common in measure theory and analysis that it’s often abbreviated as “a.e.”. Older texts will say “p.p.” for the French equivalent “presque partout“. In probability theory (measure theory’s cousin) we run into “a.s.” for “almost surely”.
No matter how we say or write it, “almost everywhere” has a hidden dependence on some measure. In many cases, the measure is obvious from context, in that there’s only one measure under consideration on a given space. However, in the case where we have two measures and
on the same measurable space, we may distinguish them by writing “
-almost everywhere” and “
-almost everywhere” (or “
-a.e.” and “
-a.e.”), or by explicitly stating with respect to which measure we mean.
We’ve actually seen this sort of thing in the wild before; Lebesgue’s condition can be reformulated to say that a bounded function defined on an
-dimensional interval
is Riemann integrable on that interval if and only if
is continuous almost everywhere (with respect to Lebesgue measure).
As more of a new example, we say that a function is “essentially bounded” if it is bounded almost everywhere. That is, if there is a constant
and some measurable set
with
so that
for all
. We’re willing to accept some points exceeding
, but no more than a set of measure zero. The infimum of all such essential bounds is the “essential supremum” of
, written
.
Topological Vector Spaces, Normed Vector Spaces, and Banach Spaces
Before we move on, we want to define some structures that blend algebraic and topological notions. These are all based on vector spaces. And, particularly, we care about infinite-dimensional vector spaces. Finite-dimensional vector spaces are actually pretty simple, topologically. For pretty much all purposes you have a topology on your base field , and the vector space (which is isomorphic to
for some
) will get the product topology.
But for infinite-dimensional spaces the product topology is often not going to be particularly useful. For example, the space of functions is a product; we write
to mean the product of one copy of
for each point in
. Limits in this topology are “pointwise” limits of functions, but this isn’t always the most useful way to think about limits of functions. The sequence
converges pointwise to a function for
and
. But we will find it useful to be able to ignore this behavior at the one isolated point and say that
. It’s this connection with spaces of functions that brings such infinite-dimensional topological vector spaces into the realm of “functional analysis”.
Okay, so to get a topological vector space, we take a vector space and put a (surprise!) topology on it. But not just any topology will do: Remember that every point in a vector space looks pretty much like every other one. The transformation has an inverse
, and it only makes sense that these be homeomorphisms. And to capture this, we put a uniform structure on our space. That is, we specify what the neighborhoods are of
, and just translate them around to all the other points.
Now, a common way to come up with such a uniform structure is to define a norm on our vector space. That is, to define a function satisfying the three axioms
- For all vectors
and scalars
, we have
.
- For all vectors
and
, we have
.
- The norm
is zero if and only if the vector
is the zero vector.
Notice that we need to be working over a field in which we have a notion of absolute value, so we can measure the size of scalars. We might also want to do away with the last condition and use a “seminorm”. In any event, it’s important to note that though our earlier examples of norms all came from inner products we do not need an inner product to have a norm. In fact, there exist norms that come from no inner product at all.
So if we define a norm we get a “normed vector space”. This is a metric space, with a metric function defined by . This is nice because metric spaces are first-countable, and thus sequential. That is, we can define the topology of a (semi-)normed vector space by defining exactly what it means for a sequence of vectors to converge, and in particular what it means for them to converge to zero.
Finally, if we’ve got a normed vector space, it’s a natural question to ask whether or not this vector space is complete or not. That is, we have all the pieces in place to define Cauchy sequences in our vector space, and we would like for all of these sequences to converge under our uniform structure. If this happens — if we have a complete normed vector space — we call our structure a “Banach space”. Most of the spaces we’re concerned with in functional analysis are Banach spaces.
Again, for finite-dimensional vector spaces (at least over or
) this is all pretty easy; we can always define an inner product, and this gives us a norm. If our underlying topological field is complete, then the vector space will be as well. Even without considering a norm, convergence of sequences is just given component-by-component. But infinite-dimensional vector spaces get hairier. Since our algebraic operations only give us finite sums, we have to take some sorts of limits to even talk about most vectors in the space in the first place, and taking limits of such vectors could just complicate things further. Studying these interesting topologies and seeing how linear algebra — the study of vector spaces and linear transformations — behaves in the infinite-dimensional context is the taproot of functional analysis.
Simple and Elementary Functions
We now introduce two classes of functions that are very easy to work with. As usual, we’re working in some measurable space .
First, we have the “simple functions”. Such a function is described by picking a finite number of pairwise disjoint measurable sets and a corresponding set of finite real numbers
. We use these to define a function by declaring
if
, and
if
is in none of the
. The very simplet example is the characteristic function
of a measurable function
. Any other simple function can be written as
Any simple function is measurable, for the preimage is the union of all the
corresponding to those
, and is thus measurable.
It’s straightforward to verify that the product and sum of any two simple functions is itself a simple function — given functions and
, we have
and
. It’s even easier to see that any scalar multiple of a simple function is simple —
. And thus the collection of simple functions forms a subalgebra of the algebra of measurable functions.
“Elementary functions” are similar to simple functions. We slightly relax the conditions by allowing a countably infinite number of measurable sets and corresponding values
.
Now, why do we care about simple functions? As it happens, every measurable function can be approximated by simple functions! That is, given any measurable function we can find a sequence
of simple functions converging pointwise to
.
To see this, first break up into its positive and negative parts
and
. If we can approximate any nonnegative measurable function by a pointwise-increasing sequence of nonnegative simple functions, then we can approximate each of
and
, and the difference of these series approximates
. So, without loss of generality, we will assume that
is nonnegative.
Okay, so here’s how we’ll define the simple functions :
That is, to define we chop up the nonnegative real numbers
into
chunks of width
, and within each of these slices we round values of
down to the lower endpoint. If
, we round all the way down to
. There can only ever be
values for
, and each of these corresponds to a measurable set. The value
corresponds to the set
while the value corresponds to the set
. And thus
is indeed a simple function.
So, does the sequence converge pointwise to
? Well, if
, then
for all
. On the other hand, if
then
; after this point,
and
are both within a slice of width
, and so
. And so given a large enough
we can bring
within any desired bound of
. Thus the sequence
increases pointwise to the function
.
But that’s not all! If is bounded above by some integer
, the sequence
converges uniformly to
. Indeed, once we get to
, we cannot have
for any
. That is, for sufficiently large
we always have
. Given an
we pick an
so that both
and
, and this
will guarantee
for every
. That is: the convergence is uniform.
This is also where elementary functions come in handy. If we’re allowed to use a countably infinite number of values, we can get uniform convergence without having to ask that be bounded. Indeed, instead of defining
for
, just chop up all positive values into slices of width
. There are only a countably infinite number of such slices, and so the resulting function
is elementary, if not quite simple.
Sequences of Measurable Functions
We let be a sequence of extended real-valued measurable functions on a measurable space
, and ask what we can say about limits of this sequence.
First of all, the function is measurable. The preimage
is the union of the countable collection
, while the preimage
is the intersection of the countable collection
. And so both of these sets are measurable, and we can restrict to the case of finite-valued functions.
So now let’s use our convenient condition. Given a real number we know that
if and only if
for some
. That is, we can write
Each term on the right is measurable since each is a measurable function, and so the set on the left is measurable. Thus we conclude that
is measurable as well.
Similarly, we find that the function is measurable.
Now the functions
are also measurable. Indeed, in proving that is measurable we can use the exact same technique as above to prove that the inner supremum is measurable; it doesn’t really depend on the supremum starting at
or higher. And then the outer infimum is exactly as before. Proving
is measurable is similar.
Now we can talk about pointwise convergence of a sequence of measurable functions. That is, for a fixed point we have the sequence
which has some limit superior
and some limit inferior
. If these two coincide, then the sequence has a proper limit
. But one of our lemmas tells us that the set of points where any two measurable functions coincide has a nice property:
has a measurable intersection with every measurable set. And thus if we define the function
on this subspace of
for which the limit exists, the resulting function is measurable.
Positive and Negative Parts of Functions
Now that we have sums and products to work with, we find that the maximum of and
— sometimes written
or
— and their minimum — sometimes written
— are measurable. Indeed, we can write
and we know that absolute values of functions are measurable.
As special cases of this construction we define the “positive part” and “negative part”
of an extended real-valued function
as
The positive part is obviously just what we get if we lop off any part of that extends below
. The negative part is a little more subtle. First we lop off everything above
, but then we take the negative of this function. As a result,
and
are both nonnegative functions. And if
is measurable, then so are
and
. We can thus write any measurable function
as the difference of two nonnegative measurable functions
Conversely, any function with measurable positive and negative parts is itself measurable.
This is sort of like how we found that functions of bounded variation can be written as the difference between two strictly increasing functions. In fact, if we’re loose about what we mean by “function”, and “derivative”, we could even see this fact as a decomposition of the derivative of a function of bounded variation into its positive and negative parts.
It will thus be useful to restrict attention to nonnegative measurable functions instead of general measurable functions. Many statements can be more easily proven for nonnegative measurable functions, and the results will be preserved when we take the difference of two functions. Since we can write any measurable function as the difference between two nonnegative ones, this will suffice.
It will also be sometimes useful to realize that we may write the absolute value of a function as
Adding and Multiplying Measurable Real-Valued Functions
One approach to the problem of adding and multiplying measurable functions on a measurable space would be to define a two-dimensional version of Borel sets and Lebesgue measure, and to tweak the definition of a measurable function to this space
like we did before to treat the additive identity
specially. Then we could set up products (which we will eventually do) and get a map
and compose this with the Borel map
or the Borel map
. In fact, if you’re up for it, you can go ahead and try working out this approach as an exercise.
Instead, we’ll take more of a low road towards showing that the sum and product of two measurable functions are measurable. We start with a useful lemma: if and
are extended real-valued measurable functions on a measurable space
and if
is any real number, then each of the sets
has a measurable intersection with every measurable set. If is itself measurable, of course, this just means that these three sets are measurable.
To see this for the set , consider the (countable) set
of rational numbers. If
really is strictly less than
, then there must be some rational number
between them. That is, if
then for some
we have
and
. And thus we can write
as the countable union
By the measurability of and
, this is the countable union of a collection of measurable sets, and is thus measurable.
We can write as
, and so the assertion for
follows from that for
. And we can write
, so the statement is true for that set as well.
Anyway, now we can verify that the sum and product of two measurable extended real-valued functions are measurable as well. We first handle infinite values separately. For the product, if and only if
. Since the sets
and
are both measurable, the set
— their union — is measurable. We can handle
,
, and
similarly.
So now we turn to our convenient condition for measurability. Since we’ve handled the sets where and
are infinite, we can assume that they’re finite. Given a real number
, we find
which is measurable by our lemma above (with in place of
). Since this is true for every real number
, the sum
is measurable.
To verify our assertion for the product , we turn and recall the polarization identities from when we worked with inner products. Remember, they told us that if we know how to calculate squares, we can calculate products. Something similar is true now, as we write
We just found that the sum and the difference
are measurable. And any positive integral power of a measurable function is measurable, so the squares of the sum and difference functions are measurable. And then the product
is a scalar multiple of the difference of these squares, and is thus measurable.
Composing Real-Valued Measurable Functions II
As promised, today we come up with an example of a measurable function and a Lebesgue measurable function
so that the composition
is not measurable. Specifically,
will be the closed unit interval
, considered as a measurable subspace of
.
Now, every point can be written out in ternary as
We set (depending on
) to be the first index for which
, and
if no such index exists. Then we define the function
That is, write out the number in ternary until you hit a , and leave off everything after that. Change all the
s to
s, and consider the resulting string of
s and
s as a number written out in binary. The extra fraction added in the formula above comes from that first
. This function is often called the “Cantor function” because of its relationship to the famous Cantor set. In case it’s not apparent, the Cantor set is the collection
of points with no
.
First of all, is increasing from
to
. Clearly
, so
; and
so
. Given points
and
, if
then
for
and
. If
and
or
, then as we write out
in binary the
th bit is
, while the
th bit of
is
and so
. On the other hand, if
and
, then the
th bit of both
and
is
, but
stops at that point while
has at least one more bit equal to
. And so again
.
Maybe more surprising is the fact that is actually continuous! If again we have
and
and
for
, then we find
Thus, given an we can find a large enough
so that
. Then we can pick a small enough
so that two numbers differing by less than
will agree to the first
places in their ternary expansions, and so
is continuous.
Unfortunately, might not be strictly increasing. Indeed, on any stretch of
, the function
is actually constant! It’s interesting to note that
manages to increase continuously from
to
while remaining constant almost everywhere. But still we’re going to need a strictly increasing function for our purposes. We get this by considering
. This still increases continuously from
to
, but now it’s strictly increasing.
But as a strictly increasing continuous function from to itself, it has a strictly increasing continuous inverse. That is, there is a strictly increasing continuous function
such that
if and only if
. And since it’s continuous, it’s Borel measurable, and any Borel measurable function is Lebesgue measurable.
Now, the set is Lebesgue measurable and has positive measure. This is the collection of points of the form
for
. To get at this, first we consider the collection
. It’s pretty straightforward to see that this consists of all terminating binary expansions, which are exactly the rational numbers. But this is a countable set, and countable sets have Lebesgue measure zero. Consequently, we find that
. Since
, there must be some positive measure in
in order to make up the difference.
But now we can take a thick, non-Lebesgue measurable set whose intersection with is itself a non-Lebesgue measurable set
. However,
, and
has Lebesgue measure zero. Since every subset of a set of Lebesgue measure zero is itself Lebesgue measurable (by completeness),
must be Lebesgue measurable, even though
is not. This is not a problem because we only ever asked that the preimage under
of any Borel set be Lebesgue measurable.
At last, we set — the characteristic function of this set
. This function
is Lebesgue measurable, because the preimage of any set is one of
,
,
or
, all of which are Lebesgue measurable. And we’ve already established that
is measurable. However, the composition
is not measurable, since the preimage of the Borel set
is
which is not Lebesgue measurable.
Composing Real-Valued Measurable Functions I
Now that we’ve tweaked our definition of a measurable real-valued function, we may have broken composability. We didn’t even say much about it when we defined the category of measurable spaces, because for most purposes it’s just like in topological spaces: given measurable functions and
and a measurable set
, the measurability of
tells us that
, and the measurability of
tells us that
.
But now we’re treating a bit differently, and so we have to be careful. I say that if
is a Borel measurable extended-real-valued function on the extended real line so that
, and if
is a measurable extended-real-valued function on a measurable space
, then the composition
is measurable. Indeed, if
is any Borel set, then we find
Since , we can write
And since is Borel measurable we know that
is a Borel set. We can thus continue our calculation from above
which is measurable by the measurability of
This is a sufficient, but far from a necessary condition. But it does allow us to bring in various useful functions in the place of . For any positive real number
we have the function
. If
is a positive integer, we have the function
. These are all continuous, which implies that they’re Borel measurable, and they send
back to itself. We conclude that any positive integral power of a measurable function is measurable, as is any positive power of the absolute value of
.
Of course, if itself is measurable as a subset of itself, then we need not tweak to our definition and we don’t need to add the requirement that
. Also, the converse of this theorem is definitely not true; if
is a non-measurable set, then the function
is not measurable even though the absolute value
is measurable.
It’s important to note here that we’re asking that be Borel measurable, because our definition of a measurable real-valued function is in terms of Borel sets in the target. Indeed, writing things out more thoroughly helps us see this: if
and
are measurable, then we can compose the functions on the underlying sets, but the target of
isn’t the same measurable space as the source of
. There is thus no reason to believe that the composite would be measurable. And tomorrow I’ll give an example of just such a case.
