The Unapologetic Mathematician

Mathematics for the interested outsider

Inclusion-Exclusion Again

Now that we know a little more about characteristic functions, let’s see how they can be used to understand the inclusion-exclusion principle. Our first pass was through a very categorified lens, which was a neat tie-in to Euler characteristics. But we want something a little more down-to-Earth.

So, let’s take a finite collection \left\{S_i\right\}_{i=1}^n of subsets of a set X. To each one we have a corresponding idempotent function \chi_{S_i}, which is {1} for points in S_i and {0} elsewhere. We can take the complement of a subset in the language of idempotents as usual: 1-\chi_{S_i}=\chi_{{S_i}^c}. This function is {1} for points not in S_i, and {0} in S_i.

Now let’s take and multiply all these complementary functions together to get


By DeMorgan’s Laws, the intersection of all these complements is the complement of the union of all the S_i. Thus

\displaystyle\prod\limits_{i=1}^n\left(1-\chi_{S_i}\right)=\chi_{(S_1\cup\dots\cup S_n)^c}=1-\chi_{S_1\cup\dots\cup S_n}

Now let’s expand out the product, just like we expand out the product of any sequence of binomials. In each factor, we choose which of the {1} or the \chi_{S_i} to contribute. First we get the term where we always choose {1}, then the terms where we choose exactly one of the \chi_{S_i}, then the terms where we choose exactly two of them, and so on until we choose all n of them. The sign of a term is negative when we choose an odd number of the \chi_{S_i}, and positive when we choose an even number. That is

\displaystyle\begin{aligned}\prod\limits_{i=1}^n\left(1-\chi_{S_i}\right)&=1-\sum\limits_{1\leq i\leq n}\chi_{S_i}+\sum\limits_{1\leq i<j\leq n}\chi_{S_i}\chi_{S_j}-\sum\limits_{1\leq i<j<k\leq n}\chi_{S_i}\chi_{S_j}\chi_{S_k}+\dots+(-1)^n\chi_{S_1}\dots\chi_{S_n}\\&=1-\sum\limits_{1\leq i\leq n}\chi_{S_i}+\sum\limits_{1\leq i<j\leq n}\chi_{S_i\cap S_j}-\sum\limits_{1\leq i<j<k\leq n}\chi_{S_i\cap S_j\cap S_k}+\dots+(-1)^n\chi_{S_1\cap\dots\cap S_n}\end{aligned}

Putting this together with the earlier expression, we find

\displaystyle\chi_{S_1\cup\dots\cup S_n}=\sum\limits_{1\leq i\leq n}\chi_{S_i}-\sum\limits_{1\leq i<j\leq n}\chi_{S_i\cap S_j}+\sum\limits_{1\leq i<j<k\leq n}\chi_{S_i\cap S_j\cap S_k}-\dots+(-1)^n\chi_{S_1\cap\dots\cap S_n}

For n=2 this gives us back our formula for the characteristic function of the intersection of two sets. For larger n it gives us the inclusion-exclusion principle in terms of characteristic functions.

December 28, 2009 Posted by | Fundamentals | 1 Comment

Characteristic Functions as Idempotents

I just talked about characteristic functions as masks on other functions. Given a function f:X\rightarrow\mathbb{R} and a subset S\subseteq X, we can mask the function f to the subset S by multiplying it by the characteristic function \chi_S. I want to talk a little more about these functions and how they relate to set theory.

First of all, it’s easy to recognize a characteristic function when we see one: they’re exactly the idempotent functions. That is, {\chi_S}^2=\chi_S, and if f^2=f then f must be \chi_S for some set S. Indeed, given a real number a, we can only have a^2=a if a=0 or a=1. That is, f(x)=0 or f(x)=1 for every x. So we can define S to be the set of x\in X for which f(x)=1, and then f(x)=\chi_S(x) for every x\in X. Thus the idempotents in the algebra of real-valued functions on X correspond exactly to the subsets of X.

We can define two operations on such idempotent functions to make them into a lattice. The easier to define is the meet. Given idempotents \chi_S and \chi_T we define the meet to be their product:


This function will take the value {1} at a point x if and only if both \chi_S and \chi_T do, so this is the characteristic function of the intersection

\displaystyle\chi_S\wedge\chi_T=\chi_{S\cap T}

We might hope that the join would be the sum of two idempotents, but in general this will not be another idempotent. Indeed, we can check:

\displaystyle(\chi_S+\chi_T)^2={\chi_S}^2+{\chi_T}^2+2\chi_S\chi_T=(\chi_S+\chi_T)+2\chi_{S\cap T}

We have a problem exactly when the corresponding sets have a nonempty intersection, which leads us to think that maybe this has something to do with the inclusion-exclusion principle. We’re “overcounting” the intersection by just adding, so let’s subtract it off to define


We can multiply this out to check its idempotence, or we could consider its values. If x is not in T, then \chi_T(x)=0, and we find \chi_S\vee\chi_T=\chi_S — it takes the value {1} if x\in S and {0} otherwise. A similar calculation holds if x\notin S, which leaves only the case when x\in S\cap T. But now \chi_S(x) and \chi_T(x) both take the value {1}, and a quick calculation shows that \chi_S\vee\chi_T does as well. This establishes that

\displaystyle\chi_S\vee\chi_T=\chi_{S\cup T}

We can push further and make this into an orthocomplemented lattice. We define the orthocomplement of an idempotent by


This function is {1} wherever \chi_S is {0}, and vice-versa. That is, it’s the characteristic function of the complement

\displaystyle\neg\chi_S=\chi_{X\setminus S}

So we can take the lattice of subsets of X and realize it in the nice, concrete algebra of real-valued functions on X. The objects of the lattice are exactly the idempotents of this algebra, and we can build the meet and join from the algebraic operations of addition and multiplication. In fact, we could turn around and do this for any commutative algebra to create a lattice, which would mimic the “lattice of subsets” of some “set”, which emerges from the algebra. This sort of trick is a key insight to quite a lot of modern geometry.

December 23, 2009 Posted by | Algebra, Fundamentals, Lattices | 2 Comments

Complex Numbers and Polar Coordinates

Forgot to hit “publish” earlier…

So we’ve seen that the unit complex numbers can be written in the form e^{i\theta} where \theta denotes the (signed) angle between the point on the circle and 1+0i. We’ve also seen that this view behaves particularly nicely with respect to multiplication: multiplying two unit complex numbers just adds their angles. Today I want to extend this viewpoint to the whole complex plane.

If we start with any nonzero complex number z=a+bi, we can find its absolute value \lvert z\rvert=\sqrt{a^2+b^2}. This is a positive real number which we’ll also call r. We can factor this out of z to find z=r\left(\frac{a}{r}+\frac{b}{r}i\right). The complex number in parentheses has unit absolute value, and so we can write it as e^{i\theta} for some \theta between -\pi and \pi. Thus we’ve written our complex number in the form

\displaystyle z=re^{i\theta}

where the positive real number r is the absolute value of z, and \theta — a real number in the range \left(-\pi,\pi\right] — is the angle z makes with the reference point 1+0i. But this is exactly how we define the polar coordinates (r,\theta) back in high school math courses.

Just like we saw for unit complex numbers, this notation is very well behaved with respect to multiplication. Given complex numbers r_1e^{i\theta_1} and r_2e^{i\theta_2} we calculate their product:

\displaystyle r_1e^{i\theta_1}r_2e^{i\theta_2}=\left(r_1r_2\right)e^{i\left(\theta_1+\theta_2\right)}

That is, we multiply their lengths (as we already knew) and add their angles, just like before. This viewpoint also makes division simple:


In particular we see that


so multiplicative inverses are given in terms of complex conjugates and magnitudes as we already knew.

Powers (including roots) are also easy, which gives rise to easy ways to remember all those messy double- and triple-angle formulæ from trigonometry:



Other angle addition formulæ should be similarly easy to verify from this point.

In general, since we consider complex numbers multiplicatively so often it will be convenient to have this polar representation of complex numbers at hand. It will also generalize nicely, as we will see.

May 29, 2009 Posted by | Fundamentals, Numbers | 3 Comments

The Circle Group

Yesterday we saw that the unit-length complex numbers are all of the form e^{i\theta}, where \theta measures the oriented angle from 1+0i around to the point in question. Since the absolute value of a complex number is multiplicative, we know that the product of two unit-length complex numbers is again of unit length. We can also see this using the exponential property:

\displaystyle e^{i\theta_1}e^{i\theta_2}=e^{i(\theta_1+\theta_2)}

So multiplying two unit-length complex numbers corresponds to adding their angles.

That is, the complex numbers on the unit circle form a group under multiplication of complex numbers — a subgroup of the multiplicative group of the complex field — and we even have an algebraic description of this group. The function sending the real number \theta to the point on the circle e^{i\theta} is a homomorphism from the additive group of real numbers to the circle group. Since every point on the circle has such a representative, it’s an epimorphism. What is the kernel? It’s the collection of real numbers satisfying

\displaystyle e^{i\theta}=\cos(\theta)+i\sin(\theta)=1+0i

that is, \theta must be an integral multiple of 2\pi — an element of the subgroup 2\pi\mathbb{Z}\subseteq\mathbb{R}. So, algebraically, the circle group is the quotient \mathbb{R}/(2\pi\mathbb{Z}). Or, isomorphically, we can just write \mathbb{R}/\mathbb{Z}.

Something important has happened here. We have in hand two distinct descriptions of the circle. One we get by putting the unit-length condition on points in the plane. The other we get by taking the real line and “wrapping” it around itself periodically. I haven’t really mentioned the topologies, but the first approach inherits the subspace topology from the topology on the complex numbers, while the second approach inherits the quotient topology from the topology on the real numbers. And it turns out that the identity map from one version of the circle to the other one is actually a homeomorphism, which further shows that the two descriptions give us “the same” result.

What’s really different between the two cases is how they generalize. I’ll probably come back to these in more detail later, but for now I’ll point out that the first approach generalizes to spheres in higher dimensions, while the second generalizes to higher-dimensional tori. Thus the circle is sometimes called the one-dimensional sphere S^1, and sometimes called the one-dimensional torus T^1, and each one calls to mind a slightly different vision of the same basic object of study.

May 27, 2009 Posted by | Algebra, Fundamentals, Group theory, Numbers | 3 Comments

Complex Numbers and the Unit Circle

When I first talked about complex numbers there was one perspective I put off, and now need to come back to. It makes deep use of Euler’s formula, which ties exponentials and trigonometric functions together in the relation

\displaystyle e^{i\theta}=\cos(\theta)+i\sin(\theta)

where we’ve written e for \exp(1) and used the exponential property.

Remember that we have a natural basis for the complex numbers as a vector space over the reals: \left\{1,i\right\}. If we ask that this natural basis be orthonormal, we get a real inner product on complex numbers, which in turn gives us lengths and angles. In fact, this notion of length is exactly that which we used to define the absolute value of a complex number, in order to get a topology on the field.

So what happens when we look at e^{i\theta}? First, we can calculate its length using this inner product, getting

\displaystyle\left\lvert e^{i\theta}\right\rvert=\cos(\theta)^2+\sin(\theta)^2=1

by the famous trigonometric identity. That is, every complex number of the form e^{i\theta} lies a unit distance from the complex number {0}.

In particular, 1+0i=e^{0i} is a nice reference point among such points. We can use it as a fixed post in the complex plane, and measure the angle it makes with any other point. For example, we can calculate the inner product


and thus we find that the point e^{i\theta} makes an angle \lvert\theta\rvert with our fixed post {1}, at least for -\pi\leq\theta\leq\pi. We see that e^{i\theta} traces a circle by increasing the angle in one direction as \theta increases from {0} to \pi, and increasing the angle in the other direction as \theta decreases from {0} to -\pi. For values of \theta outside this range, we can use the fact that

\displaystyle e^{2\pi i}=\cos(2\pi)+i\sin(2\pi)=1+0i

to see that the function e^{i\theta} is periodic with period 2\pi. That is, we can add or subtract whatever multiple of 2\pi we need to move \theta within the range -\pi<\theta\leq\pi. Thus, as \theta varies the point e^{i\theta} traces out a circle of unit radius, going around and around with period 2\pi, and every point on the unit circle has a unique representative of this form with \theta in the given range.

May 26, 2009 Posted by | Fundamentals, Numbers | 3 Comments

Properties of Complex Numbers

Today I’ll collect a few basic properties of complex numbers.

First off, they form a vector space over the reals. We constructed them as an algebra — the quotient of the algebra of polynomials by a certain ideal — and every algebra is a vector space. So what can we say about them as a vector space? The easiest fact is that it’s two-dimensional, and it’s got a particularly useful basis.

To see this, remember that we have a basis for the algebra of polynomials, which is given by the powers of the variable. So here when we throw in the formal element i, its powers form a basis of the ring \mathbb{R}[i]. But we have a relation, and that cuts things down a bit. Specifically, the element i^2 is the same as the element -1.

Given a polynomial p in the “variable” i, we can write it as


We can peel off the constant and linear terms, and then pull out a factor of i^2:


Now this factor of i^2 can be replaced by -1, which drops the overall degree. We can continue like this, eventually rewriting any term involving higher powers of i using only constant and linear terms. That is, any complex number can be written as c_0+c_1i, where c_0 and c_1 are real constants. Further, this representation is unique. This establishes the set \{1,i\} as a basis for \mathbb{C} as a vector space over \mathbb{R}.

Now the additive parts of the field structure are clear from the vector space structure here. We can write the sum of two complex numbers a_1+b_1i and a_2+b_2i simply by adding the components: (a_1+a_2)+(b_1+b_2)i. We get the negative of a complex number by taking the negatives of the components.

We can also write out products pretty simply, since we know the product of pairs of basis elements. The only one that doesn’t involve the unit of the algebra is i\otimes i\mapsto-1. So in terms of components we can write out the product of the complex numbers above as (a_1a_2-b_1b_2)+(a_1b_2+a_2b_1)i.

Notice here that the field of real numbers sits inside that of complex numbers, using scalar multiples of the complex unit. This is characteristic of algebras, but it’s worth pointing out here. Any real number a can be considered as the complex number a+0i. This preserves all the field structures, but it ignores the order on the real numbers. A small price to pay, but an important one in certain ways.

We also mentioned the symmetry between i and -i. Either one is just as valid as a square root of -1 as the other is, so if we go through consistently replacing i with -i, and -i with i, we can’t tell the difference. This leads to an automorphism of fields called “complex conjugation”. It sends the complex number a+bi to its “conjugate” a-bi. This preserves all the field structure — additive and multiplicative — and it fixes the real numbers sitting inside the complex numbers.

Studying this automorphism, and similar structures of other field extensions forms the core of what algebraists call “Galois theory”. I’m not going there now, but it’s a huge part of modern mathematics, and its study is ultimately the root of all of our abstract algebraic techniques. The first groups were automorphism groups shuffling around roots of polynomials.

August 8, 2008 Posted by | Fundamentals, Numbers | 25 Comments

The Complex Numbers

Yesterday we defined a field to be algebraically closed if any polynomial over the field always has exactly as many roots (counting multiplicities) as we expect from its degree. But we don’t know a single example of an algebraically complete field. Today we’ll (partially) remedy that problem.

First, remember the example we used for a polynomial with no roots over the real numbers. That is p=X^2+1. The problem is that we have no field element whose square is -1. So let’s just postulate one! This, of course, has the same advantages as those of theft over honest toil. We write our new element as i for “imaginary”, and throw it in with the rest of the real numbers \mathbb{R}.

Okay, just like when we threw in X as a new element, we can build up sums and products involving real numbers and this new element i. But there’s one big difference here: we have a relation that i must satisfy. When we use the evaluation map we must find \mathrm{ev}(X^2+1,i)=0. And, of course, any polynomial which includes X^2+1 as a factor must evaluate to {0} as well. But this is telling us that the kernel of the evaluation homomorphism for i contains the principal ideal (X^2+1).

Can it contain anything else? If q\in\mathbb{R}[X] is a polynomial in the kernel, but q is not divisible by X^2+1, then Euclid’s algorithm gives us a greatest common divisor of q and X^2+1, which is a linear combination of these two, and must have degree either {0} or {1}. In the former case, we would find that the evaluation map would have to send everything — even the constant polynomials — to zero. In the latter case, we’d have a linear factor of X^2+1, which would be a root. Clearly neither of these situations can occur, so the kernel of the evaluation homomorphism at i is exactly the principal ideal (X^2+1).

Now the first isomorphism theorem for rings tells us that we can impose our relation by taking the quotient ring \mathbb{R}[X]/(X^2+1). But what we just discussed above further goes to show that (X^2+1) is a maximal ideal, and the quotient of a ring by a maximal ideal is a field! Thus when we take the real numbers and adjoin a square root of -1 to get a ring we might call \mathbb{R}[i], the result is a field. This is the field of “complex numbers”, which is more usually written \mathbb{C}.

Now we’ve gone through a lot of work to just add one little extra element to our field, but it turns out this is all we need. Luckily enough, the complex numbers are already algebraically complete! This is very much not the case if we were to try to algebraically complete other fields (like the rational numbers). Unfortunately, the proof really is essentially analytic. It seems to be a completely algebraic statement, but remember all the messy analysis and topology that went into defining the real numbers.

Don’t worry, though. We’ll come back and prove this fact once we’ve got a bit more analysis under our belts. We’ll also talk a lot more about how to think about complex numbers. But for now all we need to know is that they’re the “algebraic closure” of the real numbers, we get them by adding a square root of -1 that we call i, and we can use them as an example of an algebraically closed field.

One thing we can point out now, though, is the inherent duality of our situation. You see, we didn’t just add one square root of -1. Indeed, once we have complex numbers to work with we can factor X^2+1 as (X-i)(X+i) (test this by multiplying it out and imposing the relation). Then we have another root: -i. This is just as much a square root of -1 as i was, and anything we can do with i we can do with -i. That is, there’s a symmetry in play that exchanges i and -i. We can pick one and work with it, but we must keep in mind that whenever we do we’re making a non-canonical choice.

August 7, 2008 Posted by | Algebra, Fundamentals, Numbers, Polynomials, Ring theory | 10 Comments

The Einstein Summation Convention

Look at the formulas we were using yesterday. There’s a lot of summations in there, and a lot of big sigmas. Those get really tiring to write over and over, and they get tiring really quick. Back when Einstein was writing up his papers, he used a lot of linear transformations, and wrote them all out in matrices. Accordingly, he used a lot of those big sigmas.

When we’re typing nowadays, or when we write on a pad or on the board, this isn’t a problem. But remember that up until very recently, publications had to actually set type. Actual little pieces of metal with characters raised (and reversed!) on them would get slathered with ink and pressed to paper. Incidentally, this is why companies that produce fonts are called “type foundries”. They actually forged those metal bits with letter shapes in different styles, and sold sets of them to printers.

Now Einstein was using a lot of these big sigmas, and there were pages that had so many of them that the printer would run out! Even if they set one page at once and printed them off, they just didn’t have enough little pieces of metal with big sigmas on them to handle it. Clearly something needed to be done to cut down on demand for them.

Here we note that we’re always summing over some basis. Even if there’s no basis element in a formula — say, the formula for a matrix product — the summation is over the dimension of some vector space. We also notice that when we chose to write some of our indices as subscripts and some as superscripts, we’re always summing over one of each. We now adopt the convention that if we ever see a repeated index — once as a superscript and once as a subscript — we’ll read that as summing over an appropriate basis.

For example, when we wanted to write a vector v\in V, we had to take the basis \{f_j\} of V and write up the sum

\displaystyle v=\sum\limits_{j=1}^{\dim(V)}v^jf_j

but now we just write v^jf_j. The repeated index and the fact that we’re talking about a vector in V means we sum for j running from {1} to the dimension of V. Similarly we write out the value of a linear transformation on a basis vector: T(f_j)=t_j^kg_k. Here we determine from context that k should run from {1} to the dimension of W.

What about finding the coefficients of a linear transformation acting on a vector? Before we wrote this as

\displaystyle T(v)^k=\sum\limits_{j=1}^{\dim(V)}t_j^kv^j

Where now we write the result as t_j^kv^j. Since the v^j are the coefficients of a vector in V, j must run from {1} to the dimension of V.

And similarly given linear transformations S:U\rightarrow V and T:V\rightarrow W represented (given choices of bases) by the matrices with components s_i^j and t_j^k, the matrix of their product is then written s_i^jt_j^k. Again, we determine from context that we should be summing j over a set indexing a basis for V.

One very important thing to note here is that it’s not going to matter what basis for V we use here! I’m not going to prove this quite yet, but built right into this notation is the fact that the composite of the two transformations is completely independent of the choice of basis of V. Of course, the matrix of the composite still depends on the bases of U and W we pick, but the dependence on V vanishes as we take the sum.

Einstein had a slightly easier time of things: he was always dealing with four-dimensional vector spaces, so all his indices had the same range of summation. We’ve got to pay some attention here and be careful about what vector space a given index is talking about, but in the long run it saves a lot of time.

May 21, 2008 Posted by | Fundamentals, Linear Algebra | 18 Comments

Darboux Integration

Okay, defining the integral as the limit of a net of Riemann sums is all well and good, but it’s a huge net, and it seems impossible to calculate with. We need a better way of getting a handle on these things. What we’ll use is a little trick for evaluating limits of nets that I haven’t mentioned yet: “cofinal sets”.

Given a directed set (D,\preceq), a directed subset S is cofinal if for every d\in D there is some s\in S with s\succeq d. Now watch what happens when we try to show that the limit of a net x_d is a point x. We need to find for every neighborhood U of x an index d_0 so that for every d\succeq d_0 we have x_d\in U. But if d_0 is such an index, then there is some s_0\in S above it, and every s\in S above that is also above d_0, and so x_s\in U. That is, if the limit over D exists, then the limit over S exists and has the same value.

Let’s give a cofinal set of tagged partitions by giving a rule for picking the tags that go with any partition. Then our net consists just of partitions of the interval \left[a,b\right], and the tags come for free. If the function f is Riemann-integrable, then the limit over this cofinal set will be the integral. Here’s our rule: in the closed subinterval \left[x_{i-1},x_i\right] pick a point t_i so that \lim\limits_{x\rightarrow t_i}f(x) is the supremum of the values of f in that subinterval. If the function is continuous it will attain a maximum at our tag, and if not it’ll get close or shoot off to infinity (if there is no supremum).

Why is this cofinal? Let’s imagine a tagged partition x=((x_0,...,x_n),(t_1,...,t_n)) where t_i is not chosen according to this rule. Then we can refine the partition by splitting up the ith strip in such a way that t_i is the maximum in one of the new strips, and choosing all the new tags according to the rule. Then we’ve found a good partition above the one we started with. Similarly, we can build another cofinal set by always choosing the tags where f approaches an infimum.

When we consider a partition x in the first cofinal set we can set up something closely related to the Riemann sums: the “upper Darboux sums”

\displaystyle U_x(f)=\sum\limits_{i=1}^n M_i(x_i-x_{i-1})

where M_i is the supremum of f(x) on the interval \left[x_{i-1},x_i\right], or infinity if the value of f is unbounded above here. Similarly, we can define the “lower Darboux sum”

\displaystyle L_x(f)=\sum\limits_{i=1}^n m_i(x_i-x_{i-1})

where now m_i is the infimum (or negative infinity). If the function is Riemann-integrable, then the limits over these cofinal sets both exist and are both equal to the Riemann integral. So we define a function to be “Darboux-integrable” if the limits of the upper and lower Darboux sums both exist and have the same value. Then the Darboux integral is defined to be this common value. Notice that if the function ever shoots off to positive or negative infinity we’ll get an infinite value for one of the terms, and we can never converge, so such functions are not Darboux-integrable.

We should notice here that given any partition x, the upper Darboux sum must be larger than any Riemann sum with that same partition, since no matter how we choose the tag t_i we’ll find that f(t_i)\leq M_i by definition. Similarly, the lower Darboux sum must be smaller than any Riemann sum on the same partition. Now let’s say that the upper and lower Darboux sums both converge to the same value s. Then given any neighborhood of s we can find a partition x_U so that every upper Darboux sum over a refinement of x_U is in the neighborhood, and a similar partition x_L for the lower Darboux sums. Choosing a common refinement x_R of both (which we can do because partitions form a directed set) both its upper and lower Darboux sums (and those of any of its refinements) will be in our neighborhood. Then we can choose any tags in x_R we want, and the Riemann sum will again be in the neighborhood. Thus a Darboux-integrable function is also Riemann-integrable.

So this new notion of Darboux-integrability is really the same one as Riemann-integrability, but it involves taking two limits over a much less complicated directed set. For now, we’ll just call a function which satisfies either of these two equivalent conditions “integrable” and be done with it, using whichever construction of the integral is most appropriate to our needs at the time.

January 30, 2008 Posted by | Analysis, Calculus, Orders | 3 Comments

Archimedean Groups and the Largest Archimedean Field

Okay, I’d promised to get back to the fact that the real numbers form the “largest” Archimedean field. More precisely, any Archimedean field is order-isomorphic to a subfield of \mathbb{R}.

There’s an interesting side note here. I was thinking about this and couldn’t quite see my way forward. So I started asking around Tulane’s math department and seeing if anyone knew. Someone pointed me towards Mike Mislove, and when I asked him, he suggested we ask Laszlo Fuchs around the corner from him. Dr. Fuchs, it turned out, did know the answer, and it was in a book he’d written himself: Partially Ordered Algebraic Systems. It’s an interesting little volume, which I may come back and mine later for more topics.

Anyhow, we’ll do this a little more generally. First let’s talk about Archimedean ordered groups a bit. In a totally-ordered group G we’ll say two elements a and b are “Archimedean equivalent” (A\sim B) if there are natural numbers m and n so that |a|<|b|^m and |b|<|a|^n (here I’m using the absolute value that comes with any totally-ordered group). That is, neither one is infinitesimal with respect to the other. This can be shown to be an equivalence relation, so it chops the elements of G into equivalence classes. There are always at least two in any nontrivial group because the identity element is infinitesimal with respect to everything else. We say a group is Archimedean if there are only two Archimedean equivalence classes. That is, for any a and b other than the identity, there is a natural number n with |a|<|b|^n.

Now we have a theorem of Hölder which says that any Archimedean group is order-isomorphic to a subgroup of the real numbers with addition. In particular, we will see that any Archimedean group is commutative.

Now either G has a least positive element g or it doesn’t. If it does, then e\leq x<g implies that x=e (e is the identity of the group). By the Archimedean property, any element a has an integer n so that g^n\leq a<g^{n+1}. Then we can multiply by g^{-n} to find that e\leq g^{-n}a<g, so g^{-n}a=e. Every element is thus some power of g, and the group is isomorphic to the integers \mathbb{Z}\subseteq\mathbb{R}.

On the other hand, what if given a positive x we can always find a positive y with y<x? In this case, y^2 may be greater than x, but in this case we can show that (xy^{-1})^2\leq x, and xy^{-1} itself is less than x, so in either case we have an element z with e<z<x and z^2\leq x.

Now if two positive elements a and b fail to commute then without loss of generality we can assume ba<ab. Then we pick x=aba^{-1}b^{-1}>e and choose a z to go with this x. By the Archimedean property we’ll have numbers m and n with z^m\leq a<z^{m+1} and z^n\leq b<z^{n+1}. Thus we find that x<z^2, which contradicts how we picked z. And thus G is commutative.

So we can pick some positive element a\in G and just set f(a)=1\in\mathbb{R}. Now we need to find where to send every other element. To do this, note that for any b\in G and any rational number \frac{m}{n}\in\mathbb{Q} we’ll either have a^m\leq b^n or a^m\geq b^n, and both of these situations must arise by the Archimedean property. This separates the rational numbers into two nonempty collections — a cut! So we define f(b) to be the real number specified by this cut. It’s straightforward now to show that f(bc)=f(b)+f(c), and thus establish the order isomorphism.

So all Archimedean groups are just subgroups of \mathbb{R} with addition as its operation. In fact, homomorphisms of such groups are just as simple.

Say that we have a nontrivial Archimedean group A\subseteq\mathbb{R}, a (possibly trivial) Archimedean group B\subseteq\mathbb{R}, and a homomorphism f:A\rightarrow B. If f(a)=0 for some positive a\in A then this is just the trivial homomorphism sending everything to zero, since for any positive x there is a natural number n so that x<na. In this case the homomorphism is “multiply by {0}“.

On the other hand, take any two positive elements a_1,a_2\in A and consider the quotients (in \mathbb{R}) \frac{a_1}{a_2} and \frac{f(a_1)}{f(a_2)}. If they’re different (say, \frac{f(a_1)}{f(a_2)}<\frac{a_1}{a_2}) then we can pick a rational number \frac{m}{n} between them. Then nf(a_1)<mf(a_2), while ma_2<na_1, which contradicts the order-preserving property of the isomorphism! Thus we find the ratio \frac{f(a)}{a} must be a constant r>0, and the homomorphism is “multiply by r“.

Now let’s move up to Archimedean rings, whose definition is the same as that for Archimedean fields. In this case, either the product of any two elements is {0} (we have a “zero ring”) and the additive group is order-isomorphic to a subgroup of \mathbb{R}, or the ring is order-isomorphic to a subring of \mathbb{R}. If we have a zero ring, then the only data left is an Archimedean group, which the above discussion handles, so we’ll just assume that we have some nonzero product and show that we have an order-isomorphism with a subring of \mathbb{R}.

So we’ve got some Archimedean ring R and its additive group R_+. By the theorem above, R_+ is order-isomorphic to a subgroup of \mathbb{R}. We also know that for any positive a\in R the operation \lambda_a(x)=a\cdot x (the dot will denote the product in R) is an order-homomorphism from R_+ to itself. Thus there is some non-negative real number r_a so that \lambda_a(x)=r_ax. If we define r_{-a}=-r_a then the assignment a\mapsto r_a gives us an order-homomorphism from R_+ to some group S_+\subseteq\mathbb{R}.

Again, we must have r_a=sa for some non-negative real number s. If s=0 then all multiplications in R would give zero, so 0<s, and so the assignment is invertible. Now we see that a\cdot b=r_ab=sab. Similarly, we have r_{a\cdot b}=s(a\cdot b)=(sa)(sb)=r_ar_b, and so the function a\mapsto r_a is an order-isomorphism of rings.

In particular, a field \mathbb{F} can’t be a zero ring, and so there must be an injective order-homomorphism \mathbb{F}\rightarrow\mathbb{R}. In fact, there can be only one, for if there were more than one the images would be related by multiplication by some positive r\in\mathbb{R}: \phi_1(a)=r\phi_2(a). But then r\phi_2(a)\phi_2(b)=r\phi_2(a\cdot b)=\phi_1(a\cdot b)=\phi_1(a)\phi_1(b)=r^2\phi_2(a)\phi_2(b), and so r=1.

We can sum this up by saying that the real numbers \mathbb{R} are a terminal object in the category of Archimedean fields.

December 17, 2007 Posted by | Algebra, Fundamentals, Group theory, Numbers, Orders, Ring theory | 3 Comments