Inclusion-Exclusion Again
Now that we know a little more about characteristic functions, let’s see how they can be used to understand the inclusion-exclusion principle. Our first pass was through a very categorified lens, which was a neat tie-in to Euler characteristics. But we want something a little more down-to-Earth.
So, let’s take a finite collection of subsets of a set
. To each one we have a corresponding idempotent function
, which is
for points in
and
elsewhere. We can take the complement of a subset in the language of idempotents as usual:
. This function is
for points not in
, and
in
.
Now let’s take and multiply all these complementary functions together to get
By DeMorgan’s Laws, the intersection of all these complements is the complement of the union of all the . Thus
Now let’s expand out the product, just like we expand out the product of any sequence of binomials. In each factor, we choose which of the or the
to contribute. First we get the term where we always choose
, then the terms where we choose exactly one of the
, then the terms where we choose exactly two of them, and so on until we choose all
of them. The sign of a term is negative when we choose an odd number of the
, and positive when we choose an even number. That is
Putting this together with the earlier expression, we find
For this gives us back our formula for the characteristic function of the intersection of two sets. For larger
it gives us the inclusion-exclusion principle in terms of characteristic functions.
Characteristic Functions as Idempotents
I just talked about characteristic functions as masks on other functions. Given a function and a subset
, we can mask the function
to the subset
by multiplying it by the characteristic function
. I want to talk a little more about these functions and how they relate to set theory.
First of all, it’s easy to recognize a characteristic function when we see one: they’re exactly the idempotent functions. That is, , and if
then
must be
for some set
. Indeed, given a real number
, we can only have
if
or
. That is,
or
for every
. So we can define
to be the set of
for which
, and then
for every
. Thus the idempotents in the algebra of real-valued functions on
correspond exactly to the subsets of
.
We can define two operations on such idempotent functions to make them into a lattice. The easier to define is the meet. Given idempotents and
we define the meet to be their product:
This function will take the value at a point
if and only if both
and
do, so this is the characteristic function of the intersection
We might hope that the join would be the sum of two idempotents, but in general this will not be another idempotent. Indeed, we can check:
We have a problem exactly when the corresponding sets have a nonempty intersection, which leads us to think that maybe this has something to do with the inclusion-exclusion principle. We’re “overcounting” the intersection by just adding, so let’s subtract it off to define
We can multiply this out to check its idempotence, or we could consider its values. If is not in
, then
, and we find
— it takes the value
if
and
otherwise. A similar calculation holds if
, which leaves only the case when
. But now
and
both take the value
, and a quick calculation shows that
does as well. This establishes that
We can push further and make this into an orthocomplemented lattice. We define the orthocomplement of an idempotent by
This function is wherever
is
, and vice-versa. That is, it’s the characteristic function of the complement
So we can take the lattice of subsets of and realize it in the nice, concrete algebra of real-valued functions on
. The objects of the lattice are exactly the idempotents of this algebra, and we can build the meet and join from the algebraic operations of addition and multiplication. In fact, we could turn around and do this for any commutative algebra to create a lattice, which would mimic the “lattice of subsets” of some “set”, which emerges from the algebra. This sort of trick is a key insight to quite a lot of modern geometry.
Complex Numbers and Polar Coordinates
Forgot to hit “publish” earlier…
So we’ve seen that the unit complex numbers can be written in the form where
denotes the (signed) angle between the point on the circle and
. We’ve also seen that this view behaves particularly nicely with respect to multiplication: multiplying two unit complex numbers just adds their angles. Today I want to extend this viewpoint to the whole complex plane.
If we start with any nonzero complex number , we can find its absolute value
. This is a positive real number which we’ll also call
. We can factor this out of
to find
. The complex number in parentheses has unit absolute value, and so we can write it as
for some
between
and
. Thus we’ve written our complex number in the form
where the positive real number is the absolute value of
, and
— a real number in the range
— is the angle
makes with the reference point
. But this is exactly how we define the polar coordinates
back in high school math courses.
Just like we saw for unit complex numbers, this notation is very well behaved with respect to multiplication. Given complex numbers and
we calculate their product:
That is, we multiply their lengths (as we already knew) and add their angles, just like before. This viewpoint also makes division simple:
In particular we see that
so multiplicative inverses are given in terms of complex conjugates and magnitudes as we already knew.
Powers (including roots) are also easy, which gives rise to easy ways to remember all those messy double- and triple-angle formulæ from trigonometry:
Other angle addition formulæ should be similarly easy to verify from this point.
In general, since we consider complex numbers multiplicatively so often it will be convenient to have this polar representation of complex numbers at hand. It will also generalize nicely, as we will see.
The Circle Group
Yesterday we saw that the unit-length complex numbers are all of the form , where
measures the oriented angle from
around to the point in question. Since the absolute value of a complex number is multiplicative, we know that the product of two unit-length complex numbers is again of unit length. We can also see this using the exponential property:
So multiplying two unit-length complex numbers corresponds to adding their angles.
That is, the complex numbers on the unit circle form a group under multiplication of complex numbers — a subgroup of the multiplicative group of the complex field — and we even have an algebraic description of this group. The function sending the real number to the point on the circle
is a homomorphism from the additive group of real numbers to the circle group. Since every point on the circle has such a representative, it’s an epimorphism. What is the kernel? It’s the collection of real numbers satisfying
that is, must be an integral multiple of
— an element of the subgroup
. So, algebraically, the circle group is the quotient
. Or, isomorphically, we can just write
.
Something important has happened here. We have in hand two distinct descriptions of the circle. One we get by putting the unit-length condition on points in the plane. The other we get by taking the real line and “wrapping” it around itself periodically. I haven’t really mentioned the topologies, but the first approach inherits the subspace topology from the topology on the complex numbers, while the second approach inherits the quotient topology from the topology on the real numbers. And it turns out that the identity map from one version of the circle to the other one is actually a homeomorphism, which further shows that the two descriptions give us “the same” result.
What’s really different between the two cases is how they generalize. I’ll probably come back to these in more detail later, but for now I’ll point out that the first approach generalizes to spheres in higher dimensions, while the second generalizes to higher-dimensional tori. Thus the circle is sometimes called the one-dimensional sphere , and sometimes called the one-dimensional torus
, and each one calls to mind a slightly different vision of the same basic object of study.
Complex Numbers and the Unit Circle
When I first talked about complex numbers there was one perspective I put off, and now need to come back to. It makes deep use of Euler’s formula, which ties exponentials and trigonometric functions together in the relation
where we’ve written for
and used the exponential property.
Remember that we have a natural basis for the complex numbers as a vector space over the reals: . If we ask that this natural basis be orthonormal, we get a real inner product on complex numbers, which in turn gives us lengths and angles. In fact, this notion of length is exactly that which we used to define the absolute value of a complex number, in order to get a topology on the field.
So what happens when we look at ? First, we can calculate its length using this inner product, getting
by the famous trigonometric identity. That is, every complex number of the form lies a unit distance from the complex number
.
In particular, is a nice reference point among such points. We can use it as a fixed post in the complex plane, and measure the angle it makes with any other point. For example, we can calculate the inner product
and thus we find that the point makes an angle
with our fixed post
, at least for
. We see that
traces a circle by increasing the angle in one direction as
increases from
to
, and increasing the angle in the other direction as
decreases from
to
. For values of
outside this range, we can use the fact that
to see that the function is periodic with period
. That is, we can add or subtract whatever multiple of
we need to move
within the range
. Thus, as
varies the point
traces out a circle of unit radius, going around and around with period
, and every point on the unit circle has a unique representative of this form with
in the given range.
Properties of Complex Numbers
Today I’ll collect a few basic properties of complex numbers.
First off, they form a vector space over the reals. We constructed them as an algebra — the quotient of the algebra of polynomials by a certain ideal — and every algebra is a vector space. So what can we say about them as a vector space? The easiest fact is that it’s two-dimensional, and it’s got a particularly useful basis.
To see this, remember that we have a basis for the algebra of polynomials, which is given by the powers of the variable. So here when we throw in the formal element , its powers form a basis of the ring
. But we have a relation, and that cuts things down a bit. Specifically, the element
is the same as the element
.
Given a polynomial in the “variable”
, we can write it as
We can peel off the constant and linear terms, and then pull out a factor of :
Now this factor of can be replaced by
, which drops the overall degree. We can continue like this, eventually rewriting any term involving higher powers of
using only constant and linear terms. That is, any complex number can be written as
, where
and
are real constants. Further, this representation is unique. This establishes the set
as a basis for
as a vector space over
.
Now the additive parts of the field structure are clear from the vector space structure here. We can write the sum of two complex numbers and
simply by adding the components:
. We get the negative of a complex number by taking the negatives of the components.
We can also write out products pretty simply, since we know the product of pairs of basis elements. The only one that doesn’t involve the unit of the algebra is . So in terms of components we can write out the product of the complex numbers above as
.
Notice here that the field of real numbers sits inside that of complex numbers, using scalar multiples of the complex unit. This is characteristic of algebras, but it’s worth pointing out here. Any real number can be considered as the complex number
. This preserves all the field structures, but it ignores the order on the real numbers. A small price to pay, but an important one in certain ways.
We also mentioned the symmetry between and
. Either one is just as valid as a square root of
as the other is, so if we go through consistently replacing
with
, and
with
, we can’t tell the difference. This leads to an automorphism of fields called “complex conjugation”. It sends the complex number
to its “conjugate”
. This preserves all the field structure — additive and multiplicative — and it fixes the real numbers sitting inside the complex numbers.
Studying this automorphism, and similar structures of other field extensions forms the core of what algebraists call “Galois theory”. I’m not going there now, but it’s a huge part of modern mathematics, and its study is ultimately the root of all of our abstract algebraic techniques. The first groups were automorphism groups shuffling around roots of polynomials.
The Complex Numbers
Yesterday we defined a field to be algebraically closed if any polynomial over the field always has exactly as many roots (counting multiplicities) as we expect from its degree. But we don’t know a single example of an algebraically complete field. Today we’ll (partially) remedy that problem.
First, remember the example we used for a polynomial with no roots over the real numbers. That is . The problem is that we have no field element whose square is
. So let’s just postulate one! This, of course, has the same advantages as those of theft over honest toil. We write our new element as
for “imaginary”, and throw it in with the rest of the real numbers
.
Okay, just like when we threw in as a new element, we can build up sums and products involving real numbers and this new element
. But there’s one big difference here: we have a relation that
must satisfy. When we use the evaluation map we must find
. And, of course, any polynomial which includes
as a factor must evaluate to
as well. But this is telling us that the kernel of the evaluation homomorphism for
contains the principal ideal
.
Can it contain anything else? If is a polynomial in the kernel, but
is not divisible by
, then Euclid’s algorithm gives us a greatest common divisor of
and
, which is a linear combination of these two, and must have degree either
or
. In the former case, we would find that the evaluation map would have to send everything — even the constant polynomials — to zero. In the latter case, we’d have a linear factor of
, which would be a root. Clearly neither of these situations can occur, so the kernel of the evaluation homomorphism at
is exactly the principal ideal
.
Now the first isomorphism theorem for rings tells us that we can impose our relation by taking the quotient ring . But what we just discussed above further goes to show that
is a maximal ideal, and the quotient of a ring by a maximal ideal is a field! Thus when we take the real numbers and adjoin a square root of
to get a ring we might call
, the result is a field. This is the field of “complex numbers”, which is more usually written
.
Now we’ve gone through a lot of work to just add one little extra element to our field, but it turns out this is all we need. Luckily enough, the complex numbers are already algebraically complete! This is very much not the case if we were to try to algebraically complete other fields (like the rational numbers). Unfortunately, the proof really is essentially analytic. It seems to be a completely algebraic statement, but remember all the messy analysis and topology that went into defining the real numbers.
Don’t worry, though. We’ll come back and prove this fact once we’ve got a bit more analysis under our belts. We’ll also talk a lot more about how to think about complex numbers. But for now all we need to know is that they’re the “algebraic closure” of the real numbers, we get them by adding a square root of that we call
, and we can use them as an example of an algebraically closed field.
One thing we can point out now, though, is the inherent duality of our situation. You see, we didn’t just add one square root of . Indeed, once we have complex numbers to work with we can factor
as
(test this by multiplying it out and imposing the relation). Then we have another root:
. This is just as much a square root of
as
was, and anything we can do with
we can do with
. That is, there’s a symmetry in play that exchanges
and
. We can pick one and work with it, but we must keep in mind that whenever we do we’re making a non-canonical choice.
The Einstein Summation Convention
Look at the formulas we were using yesterday. There’s a lot of summations in there, and a lot of big sigmas. Those get really tiring to write over and over, and they get tiring really quick. Back when Einstein was writing up his papers, he used a lot of linear transformations, and wrote them all out in matrices. Accordingly, he used a lot of those big sigmas.
When we’re typing nowadays, or when we write on a pad or on the board, this isn’t a problem. But remember that up until very recently, publications had to actually set type. Actual little pieces of metal with characters raised (and reversed!) on them would get slathered with ink and pressed to paper. Incidentally, this is why companies that produce fonts are called “type foundries”. They actually forged those metal bits with letter shapes in different styles, and sold sets of them to printers.
Now Einstein was using a lot of these big sigmas, and there were pages that had so many of them that the printer would run out! Even if they set one page at once and printed them off, they just didn’t have enough little pieces of metal with big sigmas on them to handle it. Clearly something needed to be done to cut down on demand for them.
Here we note that we’re always summing over some basis. Even if there’s no basis element in a formula — say, the formula for a matrix product — the summation is over the dimension of some vector space. We also notice that when we chose to write some of our indices as subscripts and some as superscripts, we’re always summing over one of each. We now adopt the convention that if we ever see a repeated index — once as a superscript and once as a subscript — we’ll read that as summing over an appropriate basis.
For example, when we wanted to write a vector , we had to take the basis
of
and write up the sum
but now we just write . The repeated index and the fact that we’re talking about a vector in
means we sum for
running from
to the dimension of
. Similarly we write out the value of a linear transformation on a basis vector:
. Here we determine from context that
should run from
to the dimension of
.
What about finding the coefficients of a linear transformation acting on a vector? Before we wrote this as
Where now we write the result as . Since the
are the coefficients of a vector in
,
must run from
to the dimension of
.
And similarly given linear transformations and
represented (given choices of bases) by the matrices with components
and
, the matrix of their product is then written
. Again, we determine from context that we should be summing
over a set indexing a basis for
.
One very important thing to note here is that it’s not going to matter what basis for we use here! I’m not going to prove this quite yet, but built right into this notation is the fact that the composite of the two transformations is completely independent of the choice of basis of
. Of course, the matrix of the composite still depends on the bases of
and
we pick, but the dependence on
vanishes as we take the sum.
Einstein had a slightly easier time of things: he was always dealing with four-dimensional vector spaces, so all his indices had the same range of summation. We’ve got to pay some attention here and be careful about what vector space a given index is talking about, but in the long run it saves a lot of time.
Darboux Integration
Okay, defining the integral as the limit of a net of Riemann sums is all well and good, but it’s a huge net, and it seems impossible to calculate with. We need a better way of getting a handle on these things. What we’ll use is a little trick for evaluating limits of nets that I haven’t mentioned yet: “cofinal sets”.
Given a directed set , a directed subset
is cofinal if for every
there is some
with
. Now watch what happens when we try to show that the limit of a net
is a point
. We need to find for every neighborhood
of
an index
so that for every
we have
. But if
is such an index, then there is some
above it, and every
above that is also above
, and so
. That is, if the limit over
exists, then the limit over
exists and has the same value.
Let’s give a cofinal set of tagged partitions by giving a rule for picking the tags that go with any partition. Then our net consists just of partitions of the interval , and the tags come for free. If the function
is Riemann-integrable, then the limit over this cofinal set will be the integral. Here’s our rule: in the closed subinterval
pick a point
so that
is the supremum of the values of
in that subinterval. If the function is continuous it will attain a maximum at our tag, and if not it’ll get close or shoot off to infinity (if there is no supremum).
Why is this cofinal? Let’s imagine a tagged partition where
is not chosen according to this rule. Then we can refine the partition by splitting up the
th strip in such a way that
is the maximum in one of the new strips, and choosing all the new tags according to the rule. Then we’ve found a good partition above the one we started with. Similarly, we can build another cofinal set by always choosing the tags where
approaches an infimum.
When we consider a partition in the first cofinal set we can set up something closely related to the Riemann sums: the “upper Darboux sums”
where is the supremum of
on the interval
, or infinity if the value of
is unbounded above here. Similarly, we can define the “lower Darboux sum”
where now is the infimum (or negative infinity). If the function is Riemann-integrable, then the limits over these cofinal sets both exist and are both equal to the Riemann integral. So we define a function to be “Darboux-integrable” if the limits of the upper and lower Darboux sums both exist and have the same value. Then the Darboux integral is defined to be this common value. Notice that if the function ever shoots off to positive or negative infinity we’ll get an infinite value for one of the terms, and we can never converge, so such functions are not Darboux-integrable.
We should notice here that given any partition , the upper Darboux sum must be larger than any Riemann sum with that same partition, since no matter how we choose the tag
we’ll find that
by definition. Similarly, the lower Darboux sum must be smaller than any Riemann sum on the same partition. Now let’s say that the upper and lower Darboux sums both converge to the same value
. Then given any neighborhood of
we can find a partition
so that every upper Darboux sum over a refinement of
is in the neighborhood, and a similar partition
for the lower Darboux sums. Choosing a common refinement
of both (which we can do because partitions form a directed set) both its upper and lower Darboux sums (and those of any of its refinements) will be in our neighborhood. Then we can choose any tags in
we want, and the Riemann sum will again be in the neighborhood. Thus a Darboux-integrable function is also Riemann-integrable.
So this new notion of Darboux-integrability is really the same one as Riemann-integrability, but it involves taking two limits over a much less complicated directed set. For now, we’ll just call a function which satisfies either of these two equivalent conditions “integrable” and be done with it, using whichever construction of the integral is most appropriate to our needs at the time.
Archimedean Groups and the Largest Archimedean Field
Okay, I’d promised to get back to the fact that the real numbers form the “largest” Archimedean field. More precisely, any Archimedean field is order-isomorphic to a subfield of .
There’s an interesting side note here. I was thinking about this and couldn’t quite see my way forward. So I started asking around Tulane’s math department and seeing if anyone knew. Someone pointed me towards Mike Mislove, and when I asked him, he suggested we ask Laszlo Fuchs around the corner from him. Dr. Fuchs, it turned out, did know the answer, and it was in a book he’d written himself: Partially Ordered Algebraic Systems. It’s an interesting little volume, which I may come back and mine later for more topics.
Anyhow, we’ll do this a little more generally. First let’s talk about Archimedean ordered groups a bit. In a totally-ordered group we’ll say two elements
and
are “Archimedean equivalent” (
) if there are natural numbers
and
so that
and
(here I’m using the absolute value that comes with any totally-ordered group). That is, neither one is infinitesimal with respect to the other. This can be shown to be an equivalence relation, so it chops the elements of
into equivalence classes. There are always at least two in any nontrivial group because the identity element is infinitesimal with respect to everything else. We say a group is Archimedean if there are only two Archimedean equivalence classes. That is, for any
and
other than the identity, there is a natural number
with
.
Now we have a theorem of Hölder which says that any Archimedean group is order-isomorphic to a subgroup of the real numbers with addition. In particular, we will see that any Archimedean group is commutative.
Now either has a least positive element
or it doesn’t. If it does, then
implies that
(
is the identity of the group). By the Archimedean property, any element
has an integer
so that
. Then we can multiply by
to find that
, so
. Every element is thus some power of
, and the group is isomorphic to the integers
.
On the other hand, what if given a positive we can always find a positive
with
? In this case,
may be greater than
, but in this case we can show that
, and
itself is less than
, so in either case we have an element
with
and
.
Now if two positive elements and
fail to commute then without loss of generality we can assume
. Then we pick
and choose a
to go with this
. By the Archimedean property we’ll have numbers
and
with
and
. Thus we find that
, which contradicts how we picked
. And thus
is commutative.
So we can pick some positive element and just set
. Now we need to find where to send every other element. To do this, note that for any
and any rational number
we’ll either have
or
, and both of these situations must arise by the Archimedean property. This separates the rational numbers into two nonempty collections — a cut! So we define
to be the real number specified by this cut. It’s straightforward now to show that
, and thus establish the order isomorphism.
So all Archimedean groups are just subgroups of with addition as its operation. In fact, homomorphisms of such groups are just as simple.
Say that we have a nontrivial Archimedean group , a (possibly trivial) Archimedean group
, and a homomorphism
. If
for some positive
then this is just the trivial homomorphism sending everything to zero, since for any positive
there is a natural number
so that
. In this case the homomorphism is “multiply by
“.
On the other hand, take any two positive elements and consider the quotients (in
)
and
. If they’re different (say,
) then we can pick a rational number
between them. Then
, while
, which contradicts the order-preserving property of the isomorphism! Thus we find the ratio
must be a constant
, and the homomorphism is “multiply by
“.
Now let’s move up to Archimedean rings, whose definition is the same as that for Archimedean fields. In this case, either the product of any two elements is (we have a “zero ring”) and the additive group is order-isomorphic to a subgroup of
, or the ring is order-isomorphic to a subring of
. If we have a zero ring, then the only data left is an Archimedean group, which the above discussion handles, so we’ll just assume that we have some nonzero product and show that we have an order-isomorphism with a subring of
.
So we’ve got some Archimedean ring and its additive group
. By the theorem above,
is order-isomorphic to a subgroup of
. We also know that for any positive
the operation
(the dot will denote the product in
) is an order-homomorphism from
to itself. Thus there is some non-negative real number
so that
. If we define
then the assignment
gives us an order-homomorphism from
to some group
.
Again, we must have for some non-negative real number
. If
then all multiplications in
would give zero, so
, and so the assignment is invertible. Now we see that
. Similarly, we have
, and so the function
is an order-isomorphism of rings.
In particular, a field can’t be a zero ring, and so there must be an injective order-homomorphism
. In fact, there can be only one, for if there were more than one the images would be related by multiplication by some positive
:
. But then
, and so
.
We can sum this up by saying that the real numbers are a terminal object in the category of Archimedean fields.