The Unapologetic Mathematician

Mathematics for the interested outsider

Egoroff’s Theorem

Let’s look back at what goes wrong when a sequence of functions doesn’t converge uniformly. Let X be the closed unit interval \left[0,1\right], and let f_n(x)=x^n. Pointwise, this converges to a function f with f(x)=0 for 0\leq x<1, and f(1)=1. This convergence can’t be uniform, because the uniform limit of a sequence of continuous functions is continuous.

But things only go wrong at the one point, and the singleton \{1\} has measure zero. That is, the sequence f_n converges almost everywhere to the function with constant value 0. The convergence still isn’t uniform, though, because we still have a problem at \{1\}. But if we cut out any open patch and only look at the interval \left[0,1-\epsilon\right], the convergence is uniform. We might think that this is “uniform a.e.”, but we have to cut out a set of positive measure to make it work. The set can be as small as we want, but we can’t get uniformity by just cutting out \{1\}.

However, what we’ve seen is a general phenomenon expressed in Egoroff’s Theorem: If E\subseteq X is a measurable set of finite measure, and if \{f_n\} is a sequence of a.e. finite-valued measurable functions converging a.e. on E to a finite-valued measurable function f, then for every \epsilon>0 there is a measurable subset F with \mu(F)<\epsilon so that \{f_n\} converges uniformly to f on E\setminus F. That is, if we have a.e. convergence we can get to uniform convergence by cutting out an arbitrarily small part of our domain.

First off, we cut out a set of measure zero from E so that \{f_n\} converges pointwise to f. Now we define the measurable sets

\displaystyle E_n^m=\bigcap\limits_{i=n}^\infty\left\{x\in X\bigg\vert\lvert f_i(x)-f(x)\rvert<\frac{1}{m}\right\}

As n gets bigger, we’re taking the intersection of fewer and fewer sets, and so E_1^m\subseteq E_2^m\subseteq\dots. Since \{f_n\} converges pointwise to f, eventually the difference \lvert f_i(x)-f(x)\rvert gets down below every \frac{1}{m}, and so \lim_nE_n^m\supseteq E for every m. Thus we conclude that \lim_n\mu(E\setminus E_n^m)=0. And so for every m there is an N(m) so that

\displaystyle\mu(E\setminus E_{N(m)}^m)<\frac{\epsilon}{2^m}

Now let’s define

\displaystyle F=\bigcup\limits_{m=1}^\infty\left(E\setminus E_{N(m)}^n\right)

This is a measurable set contained in E, and monotonicity tells us that

\displaystyle\mu(F)=\mu\left(\bigcup\limits_{m=1}^\infty\left(E\setminus E_{N(m)}^n\right)\right)\leq\sum\limits_{m=1}^\infty\mu\left(E\setminus E_{N(m)}^n\right)<\sum\limits_{m=1}^\infty\frac{\epsilon}{2^m}=\epsilon

We can calculate

\displaystyle E\setminus F=E\cap\bigcap\limits_{m=1}^\infty E_{N(m)}^m

And so given any m we take n\geq N(m). Then for any x\in E\setminus F we have x\in E_n^m, and thus \lvert f_n(x)-f(x)\rvert<\frac{1}{m}. Since we can pick this n independently of x, the convergence on E\setminus F is uniform.


May 17, 2010 Posted by | Analysis, Measure Theory | 1 Comment

Convergence Almost Everywhere

Okay, so let’s take our idea of almost everywhere and apply it to convergence of sequences of measurable functions.

Given a sequence \{f_n\}_{n=1}^\infty of extended real-valued functions on a measure space X, we say that f_n converges a.e. to the function f if there is a set E_0\subseteq X with \mu(E_0)=0 so that \lim\limits_{n\to\infty}f_n(x)=f(x) for all x\in{E_0}^c. Similarly, we say that the sequence f_n is Cauchy a.e. if there exists a set E_0 of measure zero so that \{f_n(x)\} is a Cauchy sequence of real numbers for all x\in{E_0}^c. That is, given x\notin E_0 and \epsilon>0 there is some natural number N depending on x and \epsilon so that whenever m,n\geq N we have \lvert f_m(x)-f_n(x)\rvert<\epsilon

Because the real numbers \mathbb{R} form a complete metric space, being Cauchy and being convergent are equivalent — a sequence of finite real numbers is convergent if and only if it is Cauchy, and a similar thing happens here. If a sequence of finite-valued functions is convergent a.e., then \{f_n(x)\} converges to f(x) away from a set of measure zero. Each of these sequences \{f_n(x)\} is thus Cauchy, and so \{f_n\} is Cauchy almost everywhere. On the other hand, if \{f_n\} is Cauchy a.e. then the sequences \{f_n(x)\} are Cauchy away from a set of measure zero, and these sequences then converge.

We can also define what it means for a sequence of functions to converge uniformly almost everywhere. That is, there is some set E_0 of measure zero so that for every \epsilon>0 we can find a natural number N so that for all n\geq N and x\notin E_0 we have \lvert f_n(x)-f(x)\rvert<\epsilon. The uniformity means that N is independent of x\in{E_0}^c, but if we choose a different negligible E_0 we may have to choose different values of N to get the desired control on the sequence.

As it happens, the topology defined by uniform a.e. convergence comes from a norm: the essential supremum; using this notion of convergence makes the algebra of essentially bounded measurable functions on a measure space X into a normed vector space. Indeed, we can check what it means for a sequence of functions \{f_n\} to converge to f under the essential supremum norm — for any \epsilon>0 there is some N so that for all n\geq N we have \text{ess sup}(f_n-f)<\epsilon. Unpacking the definition of the essential supremum, this means that there is some measurable set E_0 with measure zero so that \lvert f_n(x)-f(x)\rvert<\epsilon for all x\notin E_0, which is exactly what we said for uniform a.e. convergence above.

We can also turn around and define what it means for a sequence to be uniformly Cauchy almost everywhere — for any \epsilon>0 there is some N so that for all m,n\geq N we have \text{ess sup}(f_m-f_n)<\epsilon. Unpacking again, there is some measurable set E_0 so that \lvert f_m(x)-f_n(x)\rvert<\epsilon for all x\notin E_0. It’s straightforward to check that a sequence that converges uniformly a.e. is uniformly Cauchy a.e., and vice versa. That is, the topology defined by the essential supremum norm is complete, and the algebra of essentially bounded measurable functions on a measure space X is a Banach space.

May 14, 2010 Posted by | Analysis, Functional Analysis, Measure Theory | Leave a comment

Almost Everywhere

Now we come to one of the most common terms of art in analysis: “almost everywhere”. It’s unusual in that it sounds perfectly colloquial, and yet it has a very technical meaning.

The roots of “almost everywhere” are in the notion of a negligible set. If we’re working with a measure space (X,\mathcal{S},\mu) we don’t really care about subsets of sets of measure zero, and anything that happens only on such a negligible set we try to sweep under the rug. For example, let’s say we have a function defined by f(x)=0 for all x\neq0, and by f(0)=1. Colloquially, we say that f is zero “almost everywhere” because the set where it isn’t zero — the singleton \{0\} — has measure zero.

In general, if we have some property P that can be applied to points x\in X, then we say P is true “almost everywhere” if the set where P is false is negligible. That is, if we can find some measurable set E with \mu(E)=0 so that P is true for all x\notin E. Note that we don’t particularly care if the set where P is false is itself measurable, although if \mu is complete then all \mu-negligible sets will be measurable. This sort of language is so common in measure theory and analysis that it’s often abbreviated as “a.e.”. Older texts will say “p.p.” for the French equivalent “presque partout“. In probability theory (measure theory’s cousin) we run into “a.s.” for “almost surely”.

No matter how we say or write it, “almost everywhere” has a hidden dependence on some measure. In many cases, the measure is obvious from context, in that there’s only one measure under consideration on a given space. However, in the case where we have two measures \mu and \nu on the same measurable space, we may distinguish them by writing “\mu-almost everywhere” and “\mu-almost everywhere” (or “\mu-a.e.” and “\nu-a.e.”), or by explicitly stating with respect to which measure we mean.

We’ve actually seen this sort of thing in the wild before; Lebesgue’s condition can be reformulated to say that a bounded function f:[a,b]\rightarrow\mathbb{R} defined on an n-dimensional interval [a,b] is Riemann integrable on that interval if and only if f is continuous almost everywhere (with respect to Lebesgue measure).

As more of a new example, we say that a function f:X\to\mathbb{R} is “essentially bounded” if it is bounded almost everywhere. That is, if there is a constant c and some measurable set E\subseteq X with \mu(E)=0 so that \lvert f(x)\rvert\leq c for all x\notin E. We’re willing to accept some points exceeding c, but no more than a set of measure zero. The infimum of all such essential bounds is the “essential supremum” of \lvert f\rvert, written \text{ess sup}(\lvert f\rvert).

May 13, 2010 Posted by | Analysis, Measure Theory | 16 Comments

Topological Vector Spaces, Normed Vector Spaces, and Banach Spaces

Before we move on, we want to define some structures that blend algebraic and topological notions. These are all based on vector spaces. And, particularly, we care about infinite-dimensional vector spaces. Finite-dimensional vector spaces are actually pretty simple, topologically. For pretty much all purposes you have a topology on your base field \mathbb{F}, and the vector space (which is isomorphic to \mathbb{F}^n for some n) will get the product topology.

But for infinite-dimensional spaces the product topology is often not going to be particularly useful. For example, the space of functions f:X\to\mathbb{R} is a product; we write f\in\mathbb{R}^X to mean the product of one copy of \mathbb{R} for each point in X. Limits in this topology are “pointwise” limits of functions, but this isn’t always the most useful way to think about limits of functions. The sequence

\displaystyle f_n=n\chi_{\left[0,\frac{1}{n}\right]}

converges pointwise to a function f(x)=0 for n\neq0 and f(0)=\infty. But we will find it useful to be able to ignore this behavior at the one isolated point and say that f_n\to0. It’s this connection with spaces of functions that brings such infinite-dimensional topological vector spaces into the realm of “functional analysis”.

Okay, so to get a topological vector space, we take a vector space and put a (surprise!) topology on it. But not just any topology will do: Remember that every point in a vector space looks pretty much like every other one. The transformation u\mapsto u+v has an inverse u\mapsto u-v, and it only makes sense that these be homeomorphisms. And to capture this, we put a uniform structure on our space. That is, we specify what the neighborhoods are of 0, and just translate them around to all the other points.

Now, a common way to come up with such a uniform structure is to define a norm on our vector space. That is, to define a function v\mapsto\lVert v\rVert satisfying the three axioms

  • For all vectors v and scalars c, we have \lVert cv\rVert=\lvert c\rvert\lVert v\rVert.
  • For all vectors v and w, we have \lVert v+w\rVert\leq\lVert v\rVert+\lVert w\rVert.
  • The norm \lVert v\rVert is zero if and only if the vector v is the zero vector.

Notice that we need to be working over a field in which we have a notion of absolute value, so we can measure the size of scalars. We might also want to do away with the last condition and use a “seminorm”. In any event, it’s important to note that though our earlier examples of norms all came from inner products we do not need an inner product to have a norm. In fact, there exist norms that come from no inner product at all.

So if we define a norm we get a “normed vector space”. This is a metric space, with a metric function defined by d(u,v)=\lVert u-v\rVert. This is nice because metric spaces are first-countable, and thus sequential. That is, we can define the topology of a (semi-)normed vector space by defining exactly what it means for a sequence of vectors to converge, and in particular what it means for them to converge to zero.

Finally, if we’ve got a normed vector space, it’s a natural question to ask whether or not this vector space is complete or not. That is, we have all the pieces in place to define Cauchy sequences in our vector space, and we would like for all of these sequences to converge under our uniform structure. If this happens — if we have a complete normed vector space — we call our structure a “Banach space”. Most of the spaces we’re concerned with in functional analysis are Banach spaces.

Again, for finite-dimensional vector spaces (at least over \mathbb{R} or \mathbb{C}) this is all pretty easy; we can always define an inner product, and this gives us a norm. If our underlying topological field is complete, then the vector space will be as well. Even without considering a norm, convergence of sequences is just given component-by-component. But infinite-dimensional vector spaces get hairier. Since our algebraic operations only give us finite sums, we have to take some sorts of limits to even talk about most vectors in the space in the first place, and taking limits of such vectors could just complicate things further. Studying these interesting topologies and seeing how linear algebra — the study of vector spaces and linear transformations — behaves in the infinite-dimensional context is the taproot of functional analysis.

May 12, 2010 Posted by | Algebra, Analysis, Functional Analysis, Linear Algebra, Measure Theory, Topology | 9 Comments

Simple and Elementary Functions

We now introduce two classes of functions that are very easy to work with. As usual, we’re working in some measurable space (X,\mathcal{S}).

First, we have the “simple functions”. Such a function is described by picking a finite number of pairwise disjoint measurable sets \{E_i\}_{i=1}^n\subseteq\mathcal{S} and a corresponding set of finite real numbers \alpha_i. We use these to define a function by declaring f(x)=\alpha_i if x\in E_i, and f(x)=0 if x is in none of the E_i. The very simplet example is the characteristic function \chi_E of a measurable function E. Any other simple function can be written as

\displaystyle f(x)=\sum\limits_{i=1}^n\alpha_i\chi_{E_i}(x)

Any simple function is measurable, for the preimage f^{-1}(A) is the union of all the E_i corresponding to those \alpha_i\in A, and is thus measurable.

It’s straightforward to verify that the product and sum of any two simple functions is itself a simple function — given functions f=\sum\alpha_i\chi_{E_i} and g=\sum\beta_j\chi_{F_j}, we have fg=\sum\alpha_i\beta_j\chi_{E_i\cap F_j} and f+g=\sum(\alpha_i+\beta_j)\chi_{E_i\cap F_j}. It’s even easier to see that any scalar multiple of a simple function is simple — cf=\sum c\alpha_i\chi_{E_i}. And thus the collection of simple functions forms a subalgebra of the algebra of measurable functions.

“Elementary functions” are similar to simple functions. We slightly relax the conditions by allowing a countably infinite number of measurable sets E_i and corresponding values \alpha_i.

Now, why do we care about simple functions? As it happens, every measurable function can be approximated by simple functions! That is, given any measurable function f we can find a sequence f_n of simple functions converging pointwise to f.

To see this, first break f up into its positive and negative parts f^+ and f^-. If we can approximate any nonnegative measurable function by a pointwise-increasing sequence of nonnegative simple functions, then we can approximate each of f^+ and f^-, and the difference of these series approximates f. So, without loss of generality, we will assume that f is nonnegative.

Okay, so here’s how we’ll define the simple functions f_n:

\displaystyle f_n(x)=\left\{\begin{aligned}\frac{i-1}{2^n}\qquad&\frac{i-1}{2^n}\leq f(x)<\frac{i}{2^n},\quad i=1,\dots,n2^n\\n\qquad&n\leq f(x)\end{aligned}\right.

That is, to define f_n we chop up the nonnegative real numbers \left[0,n\right) into n2^n chunks of width 2^n, and within each of these slices we round values of f down to the lower endpoint. If f(x)\geq n, we round all the way down to n. There can only ever be n2^n+1 values for f_n, and each of these corresponds to a measurable set. The value \frac{i-1}{2^n} corresponds to the set

\displaystyle f^{-1}\left(\left[\frac{i-1}{2^n},\frac{i}{2^n}\right)\right)

while the value n corresponds to the set f^{-1}\left(\left[n,\infty\right]\right). And thus f_n is indeed a simple function.

So, does the sequence \{f_n\} converge pointwise to f? Well, if f(x)=\infty, then f_n(x)=n for all n. On the other hand, if k\leq f(x)<k+1 then f_k(x)=k; after this point, f_n(x) and f(x) are both within a slice of width \frac{1}{2^n}, and so 0\leq f(x)-f_n(x)<\frac{1}{2^n}. And so given a large enough n we can bring f_n(x) within any desired bound of f(x). Thus the sequence \{f_n\} increases pointwise to the function f.

But that’s not all! If f is bounded above by some integer N, the sequence f_n converges uniformly to f. Indeed, once we get to n\geq N, we cannot have f_n(x)=n for any x\in X. That is, for sufficiently large n we always have 0\leq f(x)-f_n(x)<\frac{1}{2^n}. Given an \epsilon>0 we pick an n so that both n\geq N and \frac{1}{2^n}<\epsilon, and this n will guarantee \lvert f(x)-f_n(x)\rvert<\epsilon for every x\in X. That is: the convergence is uniform.

This is also where elementary functions come in handy. If we’re allowed to use a countably infinite number of values, we can get uniform convergence without having to ask that f be bounded. Indeed, instead of defining f_n(x)=n for f(x)\geq n, just chop up all positive values into slices of width \frac{1}{2^n}. There are only a countably infinite number of such slices, and so the resulting function f_n is elementary, if not quite simple.

May 11, 2010 Posted by | Analysis, Measure Theory | 12 Comments

Sequences of Measurable Functions

We let \{f_n\}_{n=1}^\infty be a sequence of extended real-valued measurable functions on a measurable space X, and ask what we can say about limits of this sequence.

First of all, the function g(x)=\inf\limits_{n\geq1}\{f_n(x)\} is measurable. The preimage g^{-1}(\{-\infty\}) is the union of the countable collection \left\{f_n^{-1}(\{-\infty\})\right\}, while the preimage g^{-1}(\{\infty\}) is the intersection of the countable collection \left\{f_n^{-1}(\{\infty\})\right\}. And so both of these sets are measurable, and we can restrict to the case of finite-valued functions.

So now let’s use our convenient condition. Given a real number c we know that g(x)<c if and only if f_n(x)<c for some n. That is, we can write

\displaystyle\{x\in X\vert g(x)<c\}=\bigcup\limits_{n=1}^\infty\{x\in X\vert f_n(x)<c\}

Each term on the right is measurable since each f_n is a measurable function, and so the set on the left is measurable. Thus we conclude that g is measurable as well.

Similarly, we find that the function h(x)=\sup\limits_{n\geq1}\{f_n(x)\}=-\inf\limits_{n\geq1}\{-f_n(x)\} is measurable.

Now the functions

\displaystyle\begin{aligned}f^*(x)&=\limsup\limits_{n\to\infty}f_n(x)=\inf\limits_{n\geq1}\sup\limits_{m\geq n}f_m(x)\\f_*(x)&=\liminf\limits_{n\to\infty}f_n(x)=\sup\limits_{n\geq1}\inf\limits_{m\geq n}f_m(x)\end{aligned}

are also measurable. Indeed, in proving that f^* is measurable we can use the exact same technique as above to prove that the inner supremum is measurable; it doesn’t really depend on the supremum starting at 1 or higher. And then the outer infimum is exactly as before. Proving f_* is measurable is similar.

Now we can talk about pointwise convergence of a sequence of measurable functions. That is, for a fixed point x\in X we have the sequence \{f_n(x)\} which has some limit superior f^*(x) and some limit inferior f_*(x). If these two coincide, then the sequence has a proper limit \lim\limits_{n\to\infty}f_n(X)=f^*(x)=f_*(x). But one of our lemmas tells us that the set of points where any two measurable functions coincide has a nice property: \{x\in X\vert f^*(x)=f_*(x)\} has a measurable intersection with every measurable set. And thus if we define the function f(x)=\lim\limits_{n\to\infty}f_n(x) on this subspace of X for which the limit exists, the resulting function is measurable.

May 10, 2010 Posted by | Analysis, Measure Theory | 4 Comments

Positive and Negative Parts of Functions

Now that we have sums and products to work with, we find that the maximum of f and g — sometimes written f\cup g or [f\cup g](x)=\max(f(x),g(x)) — and their minimum — sometimes written f\cap g — are measurable. Indeed, we can write

\displaystyle\begin{aligned}f\cup g&=\frac{1}{2}\left(f+g+\lvert f-g\rvert\right)\\f\cap g&=\frac{1}{2}\left(f+g-\lvert f-g\rvert\right)\end{aligned}

and we know that absolute values of functions are measurable.

As special cases of this construction we define the “positive part” f^+ and “negative part” f^- of an extended real-valued function f as


The positive part is obviously just what we get if we lop off any part of f that extends below 0. The negative part is a little more subtle. First we lop off everything above 0, but then we take the negative of this function. As a result, f^+ and f^- are both nonnegative functions. And if f is measurable, then so are f^+ and f^-. We can thus write any measurable function f as the difference of two nonnegative measurable functions


Conversely, any function with measurable positive and negative parts is itself measurable.

This is sort of like how we found that functions of bounded variation can be written as the difference between two strictly increasing functions. In fact, if we’re loose about what we mean by “function”, and “derivative”, we could even see this fact as a decomposition of the derivative of a function of bounded variation into its positive and negative parts.

It will thus be useful to restrict attention to nonnegative measurable functions instead of general measurable functions. Many statements can be more easily proven for nonnegative measurable functions, and the results will be preserved when we take the difference of two functions. Since we can write any measurable function as the difference between two nonnegative ones, this will suffice.

It will also be sometimes useful to realize that we may write the absolute value of a function as

\displaystyle\lvert f\rvert=f^++f^-

May 7, 2010 Posted by | Analysis, Measure Theory | 9 Comments

Adding and Multiplying Measurable Real-Valued Functions

One approach to the problem of adding and multiplying measurable functions on a measurable space X would be to define a two-dimensional version of Borel sets and Lebesgue measure, and to tweak the definition of a measurable function to this space (\mathbb{R}^2,\mathcal{B}_2) like we did before to treat the additive identity (0,0) specially. Then we could set up products (which we will eventually do) and get a map (f,g):X\to\mathbb{R}^2 and compose this with the Borel map (x,y)\mapsto x+y or the Borel map (x,y)\mapsto xy. In fact, if you’re up for it, you can go ahead and try working out this approach as an exercise.

Instead, we’ll take more of a low road towards showing that the sum and product of two measurable functions are measurable. We start with a useful lemma: if f and g are extended real-valued measurable functions on a measurable space (X,\mathcal{S}) and if c is any real number, then each of the sets

\displaystyle\begin{aligned}A&=\left\{x\in X\vert f(x)<g(x)+c\right\}\\B&=\left\{x\in X\vert f(x)\leq g(x)+c\right\}\\C&=\left\{x\in X\vert f(x)=g(x)+c\right\}\end{aligned}

has a measurable intersection with every measurable set. If X is itself measurable, of course, this just means that these three sets are measurable.

To see this for the set A, consider the (countable) set \mathbb{Q}\subseteq\mathbb{R} of rational numbers. If f(x) really is strictly less than g(x)+c, then there must be some rational number r between them. That is, if x\in A then for some r we have f(x)<r and r-c<g(x). And thus we can write A as the countable union

\displaystyle\begin{aligned}A&=\bigcup\limits_{r\in\mathbb{Q}}\left(\left\{x\in X\vert f(x)<r\right\}\cap\left\{x\in X\vert r-c<g(x)\right\}\right)\\&=\bigcup\limits_{r\in\mathbb{Q}}\left(f^{-1}\left[-\infty,r\right]\cap\left[r-c,\infty\right]\right)\end{aligned}

By the measurability of f and g, this is the countable union of a collection of measurable sets, and is thus measurable.

We can write B as X\setminus\left\{x\in X\vert f(x)<g(x)-c\right\}, and so the assertion for B follows from that for A. And we can write C=B\setminus A, so the statement is true for that set as well.

Anyway, now we can verify that the sum and product of two measurable extended real-valued functions are measurable as well. We first handle infinite values separately. For the product, \left[fg\right](x)=\infty if and only if f(x)=g(x)=\pm\infty. Since the sets f^{-1}(\{\infty\})\cap g^{-1}(\{\infty\}) and f^{-1}(\{-\infty\})\cap g^{-1}(\{-\infty\}) are both measurable, the set [fg]^{-1}(\{\infty\}) — their union — is measurable. We can handle [fg]^{-1}(\{-\infty\}), [f+g]^{-1}(\{\infty\}), and [f+g]^{-1}(\{-\infty\}) similarly.

So now we turn to our convenient condition for measurability. Since we’ve handled the sets where f(x) and g(x) are infinite, we can assume that they’re finite. Given a real number c, we find

\displaystyle\left\{x\in X\vert f(x)+g(x)<c\right\}=\left\{x\in X\vert f(x)<c-g(x)\right\}

which is measurable by our lemma above (with -g in place of g). Since this is true for every real number c, the sum f+g is measurable.

To verify our assertion for the product fg, we turn and recall the polarization identities from when we worked with inner products. Remember, they told us that if we know how to calculate squares, we can calculate products. Something similar is true now, as we write

\displaystyle f(x)g(x)=\frac{1}{4}\left(\left(f(x)+g(x)\right)^2-\left(f(x)-g(x)\right)^2\right)

We just found that the sum f+g and the difference f-g are measurable. And any positive integral power of a measurable function is measurable, so the squares of the sum and difference functions are measurable. And then the product fg is a scalar multiple of the difference of these squares, and is thus measurable.

May 7, 2010 Posted by | Analysis, Measure Theory | 5 Comments

Composing Real-Valued Measurable Functions II

As promised, today we come up with an example of a measurable function f:X\to\mathbb{R} and a Lebesgue measurable function \phi:\mathbb{R}\to\mathbb{R} so that the composition (\phi\circ f):X\to\mathbb{R} is not measurable. Specifically, (X,\mathcal{S}) will be the closed unit interval \left[0,1\right], considered as a measurable subspace of (\mathbb{R},\mathcal{L}).

Now, every point x\in\left[0,1\right] can be written out in ternary as

\displaystyle x=\sum\limits_{i=1}^\infty\frac{\alpha_i}{3^i}=.\alpha_1\alpha_2\alpha_3\dots

We set n (depending on x) to be the first index for which \alpha_n=1, and n=\infty if no such index exists. Then we define the function

\displaystyle\psi(x)\sum\limits_{1\leq i<n}\frac{\alpha_i}{2^{i+1}}+\frac{1}{2^n}

That is, write out the number in ternary until you hit a 1, and leave off everything after that. Change all the 2s to 1s, and consider the resulting string of 0s and 1s as a number written out in binary. The extra fraction added in the formula above comes from that first \alpha_n=1. This function is often called the “Cantor function” because of its relationship to the famous Cantor set. In case it’s not apparent, the Cantor set is the collection C of points with no \alpha_n=1.

First of all, \psi is increasing from 0 to 1. Clearly 0=.000\dots, so \psi(0)=0; and 1=.222\dots so \psi(1)=1. Given points x=.\alpha_1\alpha_2\alpha_3\dots and y=.\beta_1\beta_2\beta_3\dots, if x<y then \alpha_i=\beta_i for 1\leq i<j and \alpha_j<\beta_j. If \alpha_j=0 and \beta_j=1 or \beta_j=2, then as we write out \psi(y) in binary the jth bit is 1, while the jth bit of \psi(x) is 0 and so \psi(x)<\psi(y). On the other hand, if \alpha_j=1 and \beta_j=2, then the jth bit of both \psi(x) and \psi(y) is 1, but \psi(x) stops at that point while \psi(y) has at least one more bit equal to 1. And so again \psi(x)<\psi(y).

Maybe more surprising is the fact that \psi is actually continuous! If again we have x=.\alpha_1\alpha_2\alpha_3\dots and y=.\beta_1\beta_2\beta_3\dots and \alpha_i=\beta_i for 1\leq i<j, then we find


Thus, given an \epsilon>0 we can find a large enough j so that \frac{1}{2^{j-1}}<\epsilon. Then we can pick a small enough \delta so that two numbers differing by less than \delta will agree to the first j places in their ternary expansions, and so \psi is continuous.

Unfortunately, \psi might not be strictly increasing. Indeed, on any stretch of x\in X\setminus C, the function \psi is actually constant! It’s interesting to note that \psi manages to increase continuously from 0 to 1 while remaining constant almost everywhere. But still we’re going to need a strictly increasing function for our purposes. We get this by considering y\mapsto\frac{1}{2}(y+\psi(y)). This still increases continuously from 0 to 1, but now it’s strictly increasing.

But as a strictly increasing continuous function from [0,1] to itself, it has a strictly increasing continuous inverse. That is, there is a strictly increasing continuous function f such that y=f(x) if and only if x=\frac{1}{2}(y+\psi(y)). And since it’s continuous, it’s Borel measurable, and any Borel measurable function is Lebesgue measurable.

Now, the set f^{-1}(C) is Lebesgue measurable and has positive measure. This is the collection of points of the form \frac{1}{2}(y+\psi(y)) for y\in C. To get at this, first we consider the collection \psi(X-C). It’s pretty straightforward to see that this consists of all terminating binary expansions, which are exactly the rational numbers. But this is a countable set, and countable sets have Lebesgue measure zero. Consequently, we find that \mu(f^{-1}(X-C))=\frac{1}{2}. Since \mu(f^{-1}(X))=1, there must be some positive measure in f^{-1}(C) in order to make up the difference.

But now we can take a thick, non-Lebesgue measurable set whose intersection with f^{-1}(C) is itself a non-Lebesgue measurable set S. However, f(S)=M\subseteq C, and C has Lebesgue measure zero. Since every subset of a set of Lebesgue measure zero is itself Lebesgue measurable (by completeness), M must be Lebesgue measurable, even though f^{-1}(M)=S is not. This is not a problem because we only ever asked that the preimage under f of any Borel set be Lebesgue measurable.

At last, we set \phi=\chi_M — the characteristic function of this set M. This function \phi is Lebesgue measurable, because the preimage of any set is one of \emptyset, M, M^c or \mathbb{R}, all of which are Lebesgue measurable. And we’ve already established that f is measurable. However, the composition \phi\circ f is not measurable, since the preimage of the Borel set \{1\} is

\displaystyle(\phi\circ f)^{-1}(\{1\})=f^{-1}(\phi^{-1}(\{1\}))=f^{-1}(M)=S

which is not Lebesgue measurable.

May 5, 2010 Posted by | Analysis, Measure Theory | Leave a comment

Composing Real-Valued Measurable Functions I

Now that we’ve tweaked our definition of a measurable real-valued function, we may have broken composability. We didn’t even say much about it when we defined the category of measurable spaces, because for most purposes it’s just like in topological spaces: given measurable functions f:(X_1,\mathcal{S}_1)\to(X_2,\mathcal{S}_2) and g:(X_2,\mathcal{S}_2)\to(X_3,\mathcal{S}_3) and a measurable set M\in\mathcal{S}_3, the measurability of g tells us that g^{-1}\in\mathcal{S}_2, and the measurability of f tells us that (g\circ f)^{-1}(M)=g^{-1}(f^{-1}(M))\in\mathcal{S}_1.

But now we’re treating 0 a bit differently, and so we have to be careful. I say that if \phi is a Borel measurable extended-real-valued function on the extended real line so that \phi(0)=0, and if f is a measurable extended-real-valued function on a measurable space (X,\mathcal{S}), then the composition \phi\circ f is measurable. Indeed, if M is any Borel set, then we find

\displaystyle\begin{aligned}N(\phi\circ f)\cap(\phi\circ f)^{-1}(M)&=\{x\in X\vert\phi(f(x))\in M\setminus\{0\}\}\\&=\{x\in X\vert f(x)\in\phi^{-1}(M\setminus\{0\})\}\end{aligned}

Since \phi(0)=0, we can write


And since \phi is Borel measurable we know that \phi^{-1}(M\setminus\{0\}) is a Borel set. We can thus continue our calculation from above

\displaystyle\begin{aligned}\{x\in X\vert f(x)\in\phi^{-1}(M\setminus\{0\})\}&=\{x\in X\vert f(x)\in\phi^{-1}(M\setminus\{0\})\setminus\{0\}\}\\&=N(f)\cap\{x\in X\vert f(x)\in\phi^{-1}(M\setminus\{0\})\}\\&=N(f)\cap f^{-1}(\phi^{-1}(M\setminus\{0\}))\end{aligned}

which is measurable by the measurability of f

This is a sufficient, but far from a necessary condition. But it does allow us to bring in various useful functions in the place of \phi. For any positive real number \alpha we have the function x\mapsto\lvert x\rvert^\alpha. If \alpha is a positive integer, we have the function x\mapsto x^\alpha. These are all continuous, which implies that they’re Borel measurable, and they send 0 back to itself. We conclude that any positive integral power of a measurable function is measurable, as is any positive power of the absolute value of f.

Of course, if X itself is measurable as a subset of itself, then we need not tweak to our definition and we don’t need to add the requirement that \phi(0)=0. Also, the converse of this theorem is definitely not true; if E is a non-measurable set, then the function \chi_E-\chi_{E^c} is not measurable even though the absolute value \lvert\chi_E-\chi_{E^c}\rvert=1 is measurable.

It’s important to note here that we’re asking that \phi be Borel measurable, because our definition of a measurable real-valued function is in terms of Borel sets in the target. Indeed, writing things out more thoroughly helps us see this: if f:(X,\mathcal{S})\to(\mathbb{R},\mathcal{B}) and \phi:(\mathbb{R},\mathcal{L})\to(\mathbb{R},\mathcal{B}) are measurable, then we can compose the functions on the underlying sets, but the target of f isn’t the same measurable space as the source of \phi. There is thus no reason to believe that the composite would be measurable. And tomorrow I’ll give an example of just such a case.

May 4, 2010 Posted by | Analysis, Measure Theory | 3 Comments