The Unapologetic Mathematician

Factoring Real Polynomials

Okay, we know that we can factor any complex polynomial into linear factors because the complex numbers are algebraically closed. But we also know that real polynomials can have too few roots. Now, there are a lot of fields out there that aren’t algebraically closed, and I’m not about to go through all of them. But we use the real numbers so much because of the unique position it holds by virtue of the interplay between its topology and its algebra. So it’s useful to see what we can say about real polynomials.

We start by noting that since the real numbers sit inside the complex numbers, we can consider any real polynomial as a complex polynomial. If the polynomial has a real root $r$, then the complex polynomial has a root $r+0i$. So all the real roots still show up.

Now, we might not have as many real roots as the degree would indicate. But we are sure to have as many complex roots as the degree, which will include the real roots. Some of the roots may actually be complex numbers like $a+bi$. Luckily, one really interesting thing happens here: if $a+bi$ is a root, then so is its complex conjugate $a-bi$.

Let’s write out our polynomial $c_0+c_1X+...+c_nX^n$, where all the $c_i$ are real numbers. To say that $a+bi$ is a root means that when we substitute it for ${X}$ we get the equation

$c_0+c_1(a+bi)+...+c_n(a+bi)^n=0$

Now we can take the complex conjugate of this equation

$\overline{c_0+c_1(a+bi)+...+c_n(a+bi)^n}=\overline{0}$

But complex conjugation is a field automorphism, so it preserves both addition and multiplication

$\overline{c_0}+\overline{c_1}\left(\overline{a+bi}\right)+...+\overline{c_n}\left(\overline{a+bi}\right)^n=\overline{0}$

Now since all the $c_i$ (and ${0}$) are real, complex conjugation leaves them alone. Conjugation sends $a+bi$ to $a-bi$, and so we find

$c_0+c_1(a-bi)+...+c_n(a-bi)^n=0$

So $a-bi$ is a root as well. Thus if we have a (complex) linear factor like $\left(X-(a+bi)\right)$ we’ll also have another one like $\left(X-(a-bi)\right)$. These multiply to give

\begin{aligned}\left(X-(a+bi)\right)\left(X-(a-bi)\right)=X^2-(a-bi)X-(a+bi)X+(a+bi)(a-bi)\\=X^2-(2a)X+(a^2+b^2)\end{aligned}

which is a real polynomial again.

Now let’s start with our polynomial $p$ of degree $n$. We know that over the complex numbers it has a root $\lambda$. If this root is real, then we can write

$p=(X-\lambda)\tilde{p}$

where $\tilde{p}$ is another real polynomial which has degree $n-1$. On the other hand, if $\lambda=a+bi$ is complex then $\bar{\lambda}$ is also a root of $p$, and so we can write

$p=(X-(a+bi))(X-(a-bi))\tilde{p}=(X^2-(2a)X+(a^2+b^2))\tilde{p}$

where $\tilde{p}$ is another real polynomial which has degree $n-2$.

Either way, now we can repeat our reasoning starting with $\tilde{p}$. At each step we can pull off either a linear term or a quadratic term (which can’t be factor into two real linear terms).

Thus every real polynomial factors into the product of a bunch of linear polynomials and a bunch of irreducible quadratic polynomials, and the number of linear factors plus twice the number of quadratic factors must add up to the degree of $p$. It’s not quite so nice as the situation over the complex numbers, but it’s still pretty simple. We’ll see many situations into the future where this split between two distinct real roots and a conjugate pair of complex roots (and the border case of two equal real roots) shows up with striking qualitative effects.

August 14, 2008

The Complex Numbers

Yesterday we defined a field to be algebraically closed if any polynomial over the field always has exactly as many roots (counting multiplicities) as we expect from its degree. But we don’t know a single example of an algebraically complete field. Today we’ll (partially) remedy that problem.

First, remember the example we used for a polynomial with no roots over the real numbers. That is $p=X^2+1$. The problem is that we have no field element whose square is $-1$. So let’s just postulate one! This, of course, has the same advantages as those of theft over honest toil. We write our new element as $i$ for “imaginary”, and throw it in with the rest of the real numbers $\mathbb{R}$.

Okay, just like when we threw in $X$ as a new element, we can build up sums and products involving real numbers and this new element $i$. But there’s one big difference here: we have a relation that $i$ must satisfy. When we use the evaluation map we must find $\mathrm{ev}(X^2+1,i)=0$. And, of course, any polynomial which includes $X^2+1$ as a factor must evaluate to ${0}$ as well. But this is telling us that the kernel of the evaluation homomorphism for $i$ contains the principal ideal $(X^2+1)$.

Can it contain anything else? If $q\in\mathbb{R}[X]$ is a polynomial in the kernel, but $q$ is not divisible by $X^2+1$, then Euclid’s algorithm gives us a greatest common divisor of $q$ and $X^2+1$, which is a linear combination of these two, and must have degree either ${0}$ or ${1}$. In the former case, we would find that the evaluation map would have to send everything — even the constant polynomials — to zero. In the latter case, we’d have a linear factor of $X^2+1$, which would be a root. Clearly neither of these situations can occur, so the kernel of the evaluation homomorphism at $i$ is exactly the principal ideal $(X^2+1)$.

Now the first isomorphism theorem for rings tells us that we can impose our relation by taking the quotient ring $\mathbb{R}[X]/(X^2+1)$. But what we just discussed above further goes to show that $(X^2+1)$ is a maximal ideal, and the quotient of a ring by a maximal ideal is a field! Thus when we take the real numbers and adjoin a square root of $-1$ to get a ring we might call $\mathbb{R}[i]$, the result is a field. This is the field of “complex numbers”, which is more usually written $\mathbb{C}$.

Now we’ve gone through a lot of work to just add one little extra element to our field, but it turns out this is all we need. Luckily enough, the complex numbers are already algebraically complete! This is very much not the case if we were to try to algebraically complete other fields (like the rational numbers). Unfortunately, the proof really is essentially analytic. It seems to be a completely algebraic statement, but remember all the messy analysis and topology that went into defining the real numbers.

Don’t worry, though. We’ll come back and prove this fact once we’ve got a bit more analysis under our belts. We’ll also talk a lot more about how to think about complex numbers. But for now all we need to know is that they’re the “algebraic closure” of the real numbers, we get them by adding a square root of $-1$ that we call $i$, and we can use them as an example of an algebraically closed field.

One thing we can point out now, though, is the inherent duality of our situation. You see, we didn’t just add one square root of $-1$. Indeed, once we have complex numbers to work with we can factor $X^2+1$ as $(X-i)(X+i)$ (test this by multiplying it out and imposing the relation). Then we have another root: $-i$. This is just as much a square root of $-1$ as $i$ was, and anything we can do with $i$ we can do with $-i$. That is, there’s a symmetry in play that exchanges $i$ and $-i$. We can pick one and work with it, but we must keep in mind that whenever we do we’re making a non-canonical choice.

August 7, 2008

Polynomials with Too Few Roots

Okay, we saw that roots of polynomials exactly correspond to linear factors, and that a polynomial can have at most as many roots as its degree. In fact, there’s an expectation that a polynomial of degree $n$ will have exactly $n$ roots. Today I want to talk about two ways this can fail.

First off, let’s work over the real numbers and consider the polynomial $p=X^2-2X+1$. A root of $p$ will be a real number ${x}$ so that $x^2-2x+1=0$, but a little playing around will show us that $x=1$ is the only possible solution. The degree of the polynomial is two, so we expect two roots, but we only find one. What went wrong?

Well, we know from the fact that $x=1$ is a root that $X-1$ is a factor of $p$. Our division algorithm shows us that we can write $p=(X-1)(X-1)$. The factor that gives us the root $x=1$ shows up twice! But since we’ve already counted the root once the second occurrence doesn’t do anything.

To remedy this, let’s define the “multiplicity” of a root ${x}$ to be the number of times we can evenly divide out a factor of $(X-x)$. So in our example the root ${1}$ has multiplicity $2$. When we count the roots along with their multiplicities, we get back exactly the degree.

So do multiplicities handle all our problems? Unfortunately, no. An example here over the real numbers is the polynomial $p=X^2+1$. A root of this polynomial would be a real number ${x}$ with $x^2=-1$. But since the square of any real number is nonnegative, this can’t be satisfied. So there exist polynomials with fewer roots than their degrees would indicate; even with no roots at all!

Now, some fields are well-behaved. We say that a field is “algebraically closed” if every polynomial over that field has a root ${x}$. In that case we can divide out by $(X-x)$ to get a polynomial of one degree less, which must again have a root by algebraic closure. We can keep going until we write the polynomial as the product of a bunch of linear factors — the number is the same as the degree of the polynomial — and leave one field element left once we get down to degree zero. Thus over an algebraically closed field every polynomial has exactly as many roots as its degree indicates.. if you count them with their multiplicities!

August 6, 2008

Euclid’s Algorithm for Polynomials

Consider the division algorithm. It looks a lot like division for integers, as it should. Specifically, it’s got all the features we need for the setup of Euclid’s Algorithm, and we have an analogue here!

Just like we did back then, we start with two polynomials $a$ and $b$, and we divide them to find $a=q_1b+r_1$. Then if $r_1\neq0$ we turn around and divide to find $b=q_2r_1+r_2$. We keep going until we end up with a remainder of ${0}$, at which point we can’t divide anymore, and so we stop. And we must eventually stop, because the degree of the remainders keep going down.

At the end, the last nonzero remainder will be the “greatest common divisor” of $a$ and $b$. That is, it’s a polynomial $d$ that divides both $a$ and $b$ (leaving no remainder), and any other such common divisor must itself evenly divide $d$.

Euclid’s algorithm also gives us a way of writing out the greatest common divisor as a linear combination $d=ax+by$. Just like for the integers, this marks the greatest common divisor as the linear combination with the least degree.

Now how to show all this? It’s easy: just go back to the proof of Euclid’s algorithm for integers! Everything works the same way again. We just have to stick in references to the degree where appropriate to measure the “size” of a polynomial.

In fact, if we can equip any integral domain $R$ with a degree function $\nu:R^\times\rightarrow\mathbb{N}$ (where $R^\times$ is the set of nonzero ring elements) so that we have a division algorithm like those for integers and polynomials, then we’ve got the setup for Euclid’s algorithm. In this case, we say we have a “Euclidean domain”. So the point here is that $\mathbb{F}[X]$ is Euclidean, and so it has a form of Euclid’s algorithm, and all that follows from it.

Notice that in the case of the integers we had some ambiguity about common divisors, since $\pm d$ would both work equally well. The point here is that they differ by multiplication by a unit, and so each divides the other. This sort of thing happens in the divisibility preorder for any ring. For polynomials, the units are just the nonzero elements of the base field, considered as constant (degree zero) polynomials. Thus the greatest common divisor is only determined up to a constant multiple, but we can easily ignore that little ambiguity. The “greatest” just has to be read as referring to the degree of the polynomial.

August 5, 2008 Posted by | Algebra, Polynomials, Ring theory | 1 Comment

Polynomial Division

Polynomials generally form a ring, and not a field. However, the integers also form a ring, and we have a notion of division from arithmetic. For polynomials we have a similar form of division we’ll discuss now. Instead of using the absolute value of an integer to measure its size, we’ll use the degree of a polynomial to measure its size.

Given two polynomials $a$ and $b\neq0$, we can find two more polynomials — the quotient $q$ and the remainder $r$ — so that $a=qb+r$, and $\deg(r)<\deg(b)$. These polynomials are unique. The fact that we can find them is called the “division algorithm”, even though it’s a theorem. There are various algorithms we can use to establish this fact, but one of the most straightforward is polynomial long division, which we were all taught in high school algebra.

We start by writing $\displaystyle a=0b+a$. If the degree of $a$ is less than that of $b$, we’re done. Otherwise let’s write $\deg(a)=n$ and $\deg(b)=m$. The polynomials lead off with terms $a_nX^n$ and $b_mX^m$. If we multiply $b$ by $\frac{a_n}{b_m}X^{n-m}$ its leading term becomes exactly the same as that of $a$, so when we subtract the top degree terms cancel out. That is, we write $a=\frac{a_n}{b_m}X^{n-m}b+c$, and we see that $c$ has degree less than that of $a$.

Now if the degree of $c$ is still greater than or equal to that of $b$ we can keep going. Multiply $b$ by a term so that when we subtract the product from $c$ we’ll kill off the highest remaining term and drop its degree. Since the degree of the remainder keeps going down, it must eventually drop below that of $b$. Even if the degree of $b$ is zero, we can get the remainder down to the polynomial ${0}$, which we decided to give the degree $-\infty$. This gives us our $q$ and $r$

Now for uniqueness. Let’s say we have $a=q_1b+r_1$ and $a=q_2b+r_2$, and both $r_1$ and $r_2$ have degrees less than that of $b$. We can subtract to find $(q_1-q_2)b=(r_2-r_1)$. Now the degree on the right must be less than that of $b$, since adding and subtracting polynomials can’t increase the degree beyond the larger of the two. On the left, on the other hand, the degree is at least that of $b$, with one exception: if $q_1-q_2=0$. In this case the left hand side vanishes. Since this is the only way to drop the degree of the left-hand side we must have $q_1=q_2$. And then on the right we find $r_1=r_2$.

August 4, 2008

Roots of Polynomials II

This one might have to suffice for two days. The movers are almost done loading the truck here in New Orleans, and I’ll be taking today and tomorrow to drive to Maryland.

We can actually tease out more information from the factorization we constructed yesterday. Bur first we need a little definition.

Remember that when we set up the algebra of polynomials we noted that the coefficients have to be all zero after some finite number of them. Thus there must be a greatest nonzero coefficient $c_n$. The index $n$ corresponding to this coefficient we call the “degree” of the polynomial. If all the coefficients are zero — giving the zero polynomial — then a common convention is to assign it degree $-\infty$. This actually isn’t completely arbitrary, but the details won’t concern us until later.

So, armed with this information, look at how we constructed the factorization $p=(X-x)q$. We replaced each term $c_kX^k$ in $p$ with the term $c_k(X^k-x^k)$. Then we factored out $(X-x)$ from this term, giving $c_k(X^{k-1}+X^{k-2}x+...+Xx^{k-2}+x^{k-1})$. So the highest power of $X$ that shows up in this term (with a nonzero coefficient) is $c_kX^{k-1}$. And the highest power coming from all the terms of $p$ will be $c_nX^{n-1}$. The power $X^{n-1}$ shows up only once in the expression for $q$, so there’s no way for two such terms to add together and make its coefficient turn out to be zero, and no higher power of $X$ ever shows up at all. Thus the degree of $q$ is one less than that of $p$.

So what does this gain us? Well, each time we find a root we can factor out a term like $(X-x)$, which reduces the degree by ${1}$. So if $p$ has degree $n$ there can only be at most $n$ roots!

A nonzero constant polynomial $p=c_0$ has degree ${0}$, but it also has no roots! Perfect.

A linear polynomial $p=c_0+c_1X$ has degree ${1}$, and it has exactly one root: $-\frac{c_0}{c_1}$.

Now let’s assume that our statement is true for all polynomials of degree $n$: they have $n$ or fewer roots. Then given a polynomial $p$ of degree $n+1$ either $p$ has a root $x$ or it doesn’t. If it doesn’t, then we’re already done. If it does, then we can factor $p=(X-x)q$, where $p$ has degree $n$. But then $q$ can have at most $n$ roots, and thus $p$ can have at most $n+1$ roots!

A nice little corollary of this is that if our base field $\mathbb{F}$ is infinite (like it is for the most familiar examples) then only the zero polynomial can give us the zero function when we evaluate it at various field elements. That is, if $p(x)=0$ for all $x\in\mathbb{F}$, then $p=0$. This must be true because $p$ has an infinite number of roots, and so no finite degree polynomial can possibly have that many roots. The only possibility left is the zero polynomial.

Just to be clear, though, let’s look at this one counterexample. Think about the field $\mathbb{Z}_3$ we used when we talked about Set. The polynomial $p=X^3-X$ is not the zero polynomial, but $p(x)$ is the zero function. Indeed $p(0)=0^3-0=0$, $p(1)=1^3-1=0$, and $p(2)=2^3-2=8-2=6$, which is divisible by $3$, and so is the same as ${0}$ in this field.

July 31, 2008

Roots of Polynomials I

When we consider a polynomial as a function, we’re particularly interested in those field elements $x$ so that $p(x)=0$. We call such an $x$ a “zero” or a “root” of the polynomial $p$.

One easy way to get this to happen is for $p$ to have a factor of $X-x$. Indeed, in that case if we write $p=(X-x)q$ for some other polynomial $q$ then we evaluate to find

$p(x)=(X-x)q(x)=0$

The interesting thing is that this is the only way for a root to occur, other than to have the zero polynomial. Let’s say we have the polynomial

$p=c_0+c_1X+c_2X^2+...+c_nX^n$

and let’s also say we’ve got a root $x$ so that $p(x)=0$. But that means

$0=c_0+c_1x+c_2x^2+...+c_nx^n$

This is not just a field element — it’s the zero polynomial! So we can subtract it from $p$ to find

\begin{aligned}p=\left(c_0+c_1X+c_2X^2+...+c_nX^n\right)-\left(c_0+c_1x+c_2x^2+...+c_nx^n\right)\\=c_1(X-x)+c_2(X^2-x^2)+...+c_n(X^n-x^n)\end{aligned}

Now for any $k$ we can use the identity

$X^k-x^k=(X-x)(X^{k-1}+X^{k-2}x+...+Xx^{k-2}+x^{k-1})$

to factor out $(X-x)$ from each term above. This gives the factorization we were looking for.

July 30, 2008

Polynomials

Okay, we’re going to need some other algebraic tools before we go any further into linear algebra. Specifically, we’ll need to know a few things about the algebra of polynomials. Specifically (and diverging from the polynomials discussed earlier) we’re talking about polynomials in one variable, and with coefficients in the field we’re building our vector spaces over already.

We’ll write this algebra as $\mathbb{F}[X]$, where $X$ is now not a “variable”, like it was back in high school calculus. It’s a new element of the algebra. We start with the field $\mathbb{F}$ which is trivially an algebra over itself. Then we just throw in this new thing called $X$. Then, since we want to still be an algebra over $\mathbb{F}$, we have to be able to multiply elements. Defining a scalar multiple $cX$ for each $c\in\mathbb{F}$ is a good start, but we also have to multiply $X$ by itself to get $X^2$. There’s no reason this should be anything we’ve seen yet, so it’s new. Going along, we get $X^3$, $X^4$, and so on. Each of these is a new element, and we also get scalar multiples $cX^k$, and even linear combinations:

$\displaystyle\sum\limits_{k=0}^\infty c_kX^k$

as long as there are only a finite number of nonzero terms in this sum. That is, the coefficients are all zero after some point. We customarily take $X^0=1$ — the unit of the algebra.

Note here that we’re not using the summation convention for polynomials, though we could in principle. Remember, an algebra is a vector space, and what we’ve said above establishes that the set $\{X^k\}$ constitutes a basis for this vector space!

The algebra structure can be specified by defining it on pairs of basis elements. Remember that the structure is just a bilinear multiplication, which is just a linear map from the tensor square $\mathbb{F}[X]\otimes\mathbb{F}[X]$ to $\mathbb{F}[X]$. And we know that the basis for a tensor product consists of pairs of basis elements. So we can specify this linear map on a basis and extend by linearity — bilinearity — whatever…

Anyhow, how should we define the multiplication? Simply: $\mu(X^m,X^n)=X^{m+n}$. Then the whole rest of the algebra structure is defined for us. Now this looks like adding exponents, but remember we can just as well think of these as indices on basis elements that just happen to add when we multiply corresponding basis elements. Thus we wouldn’t be out of place using the summation convention here, though we won’t for the moment.

July 28, 2008