The Unapologetic Mathematician

Mathematics for the interested outsider

Representations of a Polynomial Algebra

Sorry for the delays, but tests are killing me this week.

Okay, so let’s take the algebra of polynomials, \mathbb{F}[X] and consider its representation theory.

What is a representation of this algebra? It’s a homomorphism of \mathbb{F}-algebras \rho:\mathbb{F}[X]\rightarrow\hom_\mathbb{F}(V,V). But the algebra of polynomials satisfies a universal property! A homomorphism of \mathbb{F}-algebras is uniquely determined by the image of the single element X, and we can pick this image freely. That is, once we pick a linear transformation T:V\rightarrow V and set \rho_T(X)=T, then we are forced to use


for all the other polynomials. That is, representations \rho_T of \mathbb{F}[X] are in bijection with the linear transformations T\in\hom_\mathbb{F}(V,V).

But remember that these representations don’t live in a vacuum. No, they’re just the objects of a whole category of representations. We need to consider the morphisms between representations too!

So if S:V\rightarrow V and T:W\rightarrow W are linear transformations, what’s a morphism \phi:\rho_S\rightarrow\rho_T? It’s a linear map \phi:V\rightarrow W such that S\circ\phi=\phi\circ T:V\rightarrow W. Notice that if \phi intertwines the linear maps S and T, then it will automatically intertwine the values of \rho_S and \rho_T for every polynomial.

Rather than try to examine this condition in detail (which leads to an interesting problem in the theory of quivers, if I recall), let’s just consider which representations are isomorphic. That is, let’s decategorify this category.

So we ask that the linear map \phi:V\rightarrow W be an isomorphism, with inverse \phi^{-1}:W\rightarrow V. Then we can take the intertwining relation S\circ\phi=\phi\circ T and compose on the left with \phi^{-1} to find T=\phi^{-1}\circ S\circ\phi. But this uniquely specifies T given S and \phi. That is, given a representation \rho_S:\mathbb{F}[X]\rightarrow\hom_\mathbb{F}(V,V) and an isomorphism \phi:V\rightarrow W, there is a unique representation \rho_T:\mathbb{F}[X]\rightarrow\hom_\mathbb{F}(V,V) so that \phi:\rho_S\rightarrow\rho_T is a natural isomorphism.

And we’re drawn again to consider the special case where V=W. Now an isomorphism is just a change of basis. Representations of \mathbb{F}[X] are equivalent if they do “the same thing” to the vector space V, but just express it with different coordinates.

So here’s the upshot: the general linear group \mathrm{GL}(V) acts on the hom-set \hom_\mathbb{F}(V,V) by conjugation — basis changes. In fact, this is a representation of the group, but I’m not ready to go into that detail right now. What I can say is that the orbits of this action are in bijection with the equivalence classes of representations of \mathbb{F}[X] on V.

October 30, 2008 Posted by | Algebra, Linear Algebra, Representation Theory | 1 Comment

The Category of Representations

Now let’s narrow back in to representations of algebras, and the special case of representations of groups, but with an eye to the categorical interpretation. So, representations are functors. And this immediately leads us to the category of such functors. The objects, recall, are functors, while the morphisms are natural transformations. Now let’s consider what, exactly, a natural transformation consists of in this case.

Let’s say we have representations \rho:A:\rightarrow\hom_\mathbb{F}(V,V) and \sigma:A\rightarrow\hom_\mathbb{F}(W,W). That is, we have functors \rho and \sigma with \rho(*)=V, \sigma(*)=W — where * is the single object of A, when it’s considered as a category — and the given actions on morphisms. We want to consider a natural transformation \phi:\rho\rightarrow\sigma.

Such a natural transformation consists of a list of morphisms indexed by the objects of the category A. But A has only one object: *. Thus we only have one morphism, \phi_*, which we will just call \phi.

Now we must impose the naturality condition. For each arrow a:*\rightarrow * in A we ask that the diagram


commute. That is, we want \phi\circ\rho(a)=\sigma(a)\circ\phi for every algebra element a. We call such a transformation an “intertwiner” of the representations. These intertwiners are the morphisms in the category of \mathbf{Rep}(A) of representations of A. If we want to be more particular about the base field, we might also write \mathbf{Rep}_\mathbb{F}(A).

Here’s another way of putting it. Think of \phi as a “translation” from V to W. If \phi is an isomorphism of vector spaces, for instance, it could be a change of basis. We want to take a transformation from the algebra A and apply it, and we also want to translate. We could first apply the transformation in V, using the representation \rho, and then translate to W. Or we could first translate from V to W and then apply the transformation, now using the representation \sigma. Our condition is that either order gives the same result, no matter which element of A we’re considering.

October 28, 2008 Posted by | Category theory, Group theory, Representation Theory, Ring theory | 8 Comments

Category Representations

We’ve seen how group representations are special kinds of algebra representations. But even more general than that is the representation of a category.

A group is a special monoid, within which each element is invertible. And a monoid is just a category with a single object. Similarly, an \mathbb{F}-algebra is just like a monoid but enriched over the category of vector spaces over \mathbb{F}. That is, it’s a one-object category with an \mathbb{F}-bilinear composition. It makes sense to regard both of these structures as categories of sorts. A representation will then be a functor from one of these categories.

The clear target category is \mathbf{Vect}_\mathbb{F}. So what’s a functor \rho from, say, a group G (considered as a category) to \mathbf{Vect}_\mathbb{F}? First the single object of the category G picks out some object V\in\mathbf{Vect}_\mathbb{F}. That is, V is a vector space over \mathbb{F}. Then for each arrow g in G — each group element — we have an arrow \rho(g)\in\hom_\mathbb{F}(V,V). Since g has to be invertible, this \rho(g) must be invertible — an element of \mathrm{GL}(V).

What about an algebra? Now our source category A and our target category \mathbf{Vect}_\mathbb{F} are both enriched over \mathbf{Vect}_\mathbb{F}. It only makes sense, then, for us to consider \mathbb{F}-linear functors. Such a functor F again picks out a single vector space V for the single object of A (considered as a category). Every arrow a in A gets sent to an arrow \alpha(a)\in\hom_\mathbf{F}(V,V). This mapping is linear over the field \mathbb{F}.

So what do category representations get us? Well, one thing is this: consider a combinatorial graph — a collection of “vertices” with some directed “edges” joining them. A path in the graph is a sequence of directed edges joined tip-to-tail, and the collection of all paths in the graph constitutes the “path category” of the graph (exercise: identify the identity paths). A representation of this path category is what mathematicians call a “quiver representation”, and they’re big business.

More interesting to me is this: the category \mathcal{T}ang of tangles (or \mathcal{OT}ang of oriented tangles, \mathcal{F}r\mathcal{T}ang of framed tangles, or \mathcal{F}r\mathcal{OT}ang of framed, oriented tangles). This is a monoidal category with duals, as is \mathbf{Vect}_\mathbb{F}, and so it only makes sense to ask that our functors respect those structures as well. We don’t ask that it send the braiding to the symmetry on \mathbf{Vect}_\mathbb{F}, since that would trivialize the structure.

So what is a representation of the category \mathcal{T}ang? It is my contention that this is nothing but a knot invariant, viewed in a more natural habitat. A little more generally, knot invariants are the restrictions to knots (and links) of functors defined on the category of tangles, which can often (always?) be decategorified — or otherwise rendered down — into representations of \mathcal{T}ang. This is my work: to translate existing knot theoretical ideas into this algebraic language, where I believe they find a better home.

October 27, 2008 Posted by | Algebra, Category theory, Linear Algebra, Representation Theory | 7 Comments

Algebra Representations

We’ve defined a representation of the group G as a homomorphism \rho:G\rightarrow\mathrm{GL}(V) for some vector space V. But where did we really use the fact that G is a group?

This leads us to the more general idea of representing a monoid M. Of course, now we don’t need the image of a monoid element to be invertible, so we may as well just consider a homomorphism of monoids \rho:M\rightarrow\hom_\mathbb{F}(V,V), where we consider this endomorphism algebra as a monoid under composition.

And, of course, once we’ve got monoids and \mathbb{F}-linearity floating around, we’re inexorably drawn — Serge would way we have an irresistable compulsion — to consider monoid objects in the category of \mathbb{F}-modules. That is: \mathbb{F}-algebras.

And, indeed, things work nicely for \mathbb{F}-algebras. We say a representation of an \mathbb{F}-algebra A is a homomorphism \rho:A\rightarrow\hom_\mathbb{F}(V,V) for some vector space V over \mathbb{F}. How else can we view such a homomorphism?

Well, it turns an algebra element into an endomorphism. And the most important thing about an endomorphism is that it does something to vectors. So given an algebra element a\in A, and a vector v\in V, we get a new vector \left[\rho(a)\right](v). And this operation is \mathbb{F}-linear in both of its variables. So we have a linear map \mathrm{ev}\circ(\rho\otimes1_V):A\otimes V\rightarrow V, built from the representation \rho and the evaluation map \mathrm{ev}. But this is just a left A-module!

In fact, the evaluation above is the counit of the adjunction between \underline{\hphantom{X}}\otimes V and the internal \hom functor \hom_\mathbb{F}(V,\underline{\hphantom{X}}). This adjunction is a natural isomorphism of \hom sets: \hom_\mathbb{F}(A\otimes V,V)\cong\hom_\mathbb{F}(A,\hom_\mathbb{F}(V,V)). That is, left A-modules are in natural bijection with representations of A. In practice, we just consider the two structures to be the same, and we talk interchangeably about modules and representations.

As it would happen, the notion of an algebra representation properly extends that of a group representation. Given any group G we can build the group algebra \mathbb{F}[G]. As a vector space, this has a basis vector e_g for each group element g\in G. We then define a multiplication on pairs of basis elements by e_{g_1}e_{g_2}=e_{g_1g_2}, and extend by bilinearity.

Now it turns out that representations of the group G and representations of the group algebra \mathbb{F}[G] are in bijection. Indeed, the basis vectors e_g are invertible in the algebra \mathbb{F}[G]. Thus, given a homomorphism \alpha:\mathbb{F}[G]\rightarrow\hom_\mathbb{F}(V,V), the linear maps \rho(g)=\alpha(e_g) must be invertible. And so we have a group representation \rho:G\rightarrow\mathrm{GL}(V). Conversely, if \rho:G\rightarrow\mathrm{GL}(V) is a representation of the group G, then we can define \alpha(e_g)=\rho(g)\in\mathrm{GL}(V)\subset\hom_\mathbb{F}(V,V) and extend by linearity to get an algebra representation \alpha:\mathbb{F}[G]\rightarrow\hom_\mathbb{F}(V,V).

So we have representations of algebras. Within that we have the special cases of representations of groups. These allow us to cast abstract algebraic structures into concrete forms, acting as transformations of vector spaces.

October 24, 2008 Posted by | Algebra, Group theory, Linear Algebra, Representation Theory, Ring theory | 3 Comments

Group Representations

We’ve now got the general linear group \mathrm{GL}(V) of all invertible linear maps from a vector space V to itself. Incidentally this lives inside the endomorphism algebra \hom_\mathbf{Vect}(V,V) of all linear transformations from V to itself. In fact, in ring-theory terms it’s the group of units of that algebra. So what can we do with it?

One of the biggest uses is to provide representations for other algebraic structures. Let’s say we’ve got some abstract group. It’s a set with some binary operation defined on it, sure, but what does it do? We’ve seen groups acting on sets before, where we interpret a group element as a permutation of an actual collection of elements. Alternatively, an action of a group G is a homomorphism from G to the group of permutations of some set S\hom_\mathbf{Set}(S,S).

Another concrete representation of a group is as symmetries of some vector space. That is, we’re interested in homomorphisms \rho:G\rightarrow\mathrm{GL}(V). A “representation” of a group G is a vector space V with such a homomorphism.

In fact, this extends the notion of a group acting on a set. Indeed, for any set S we can build the free vector space \mathbb{F}[S] with a basis vector e_s for each s\in S. Given a permutation \pi on S we get a linear map \mathbb{F}[\pi]:\mathbb{F}[S]\rightarrow\mathbb{F}[S] defined by setting \mathbb{F}[\pi](e_s)=e_{\pi(s)} and extending by linearity.

We thus get a homomorphism from the group of permutations of S to \mathrm{GL}(\mathbb{F}[S]). And then if we have a group action on S we can promote it to a representation on the vector space \mathbb{F}[S]. We call such a representation a “permutation representation”.

October 23, 2008 Posted by | Algebra, Group theory, Linear Algebra, Representation Theory | 13 Comments

General Linear Groups — Generally

Monday, we saw that the general linear groups \mathrm{GL}_n(\mathbb{F}) are matrix groups, specifically consisting of those whose columns are linearly independent. But what about more general vector spaces?

Well, we know that every finite-dimensional vector space has a basis, and is thus isomorphic to \mathbb{F}^n, where n is the cardinality of the basis. So given a vector space V with a basis \{f_i\} of cardinality n, we have the isomorphism S:\mathbb{F}^n\rightarrow V defined by S(e_i)=f_i and S^{-1}(f_i)=e_i.

This isomorphism of vector spaces then induces an isomorphism of their automorphism groups. That is, \mathrm{GL}(V)\cong\mathrm{GL}_n(\mathbb{F}). Given an invertible linear transformation T:V\rightarrow V, we can conjugate it by S to get S^{-1}TS:\mathbb{F}^n\rightarrow\mathbb{F}^n. This has inverse S^{-1}T^{-1}S, and so is an element of \mathrm{GL}_n(\mathbb{F}). Thus (not unexpectedly) every invertible linear transformation from a vector space V to itself gets an invertible matrix.

But this assignment depends essentially on the arbitrary choice of the basis \{f_i\} for V. What if we choose a different basis \{\tilde{f}_i\}? Then we get a new transformation \tilde{S} and a new isomorphism of groups T\mapsto\tilde{S}^{-1}T\tilde{S}. But this gives us an inner automorphism of \mathrm{GL}_n(\mathbb{F}). Given a transformation M:\mathbb{F}^n\rightarrow\mathbb{F}^n, we get the transformation
This composite \tilde{S}^{-1}S sends \mathbb{F}^n to itself, and it has an inverse. Thus changing the basis on V induces an inner automorphism of the matrix group \mathrm{GL}_n(\mathbb{F}).

Now let’s consider a linear transformation T:V\rightarrow V. We have two bases for V, and thus two different matrices — two different elements of \mathrm{GL}_n(\mathbb{F}) — corresponding to T: S^{-1}TS and \tilde{S}^{-1}T\tilde{S}. We get from one to the other by conjugation with \tilde{S}^{-1}S:


And what is this transformation \tilde{S}^{-1}S? How does it act on a basis vector in \mathbb{F}^n? We calculate:
where f_j=x_j^i\tilde{f}_i expresses the vectors in one basis for V in terms of those of the other. That is, the jth column of the matrix X consists of the components of f_j written in terms of the \tilde{f}_i. Similarly, the inverse matrix X^{-1} with entries \tilde{x}_i^j, writes the \tilde{f}_j in terms of the f_i: \tilde{f}_i=\tilde{x}_i^jf_j.

It is these “change-of-basis” matrices that effect all of our, well, changes of basis. For example, say we have a vector v\in V with components v=v^jf_j. Then we can expand this:


So our components in the new basis are \tilde{v}^i=x_k^iv^k.

As another example, say that we have a linear transformation T:V\rightarrow V with matrix components t_i^j with respect to the basis \{f_i\}. That is, T(f_i)=t_i^jf_j. Then we can calculate:


and we have the new matrix components \tilde{t}_i^j=\tilde{x}_i^kt_k^lx_l^j.

October 22, 2008 Posted by | Algebra, Group Examples, Linear Algebra | 12 Comments

The General Linear Groups

Not just any general group \mathrm{GL}(V) for any vector space V, but the particular groups \mathrm{GL}_n(\mathbb{F}). I can’t put LaTeX, or even HTML subscripts in post titles, so this will have to do.

The general linear group \mathrm{GL}_n(\mathbb{F}) is the automorphism group of the vector space \mathbb{F}^n of n-tuples of elements of \mathbb{F}. That is, it’s the group of all invertible linear transformations sending this vector space to itself. The vector space \mathbb{F}^n comes equipped with a basis \{e_i\}, where e_i has a {1} in the ith place, and {0} elsewhere. And so we can write any such transformation as an n\times n matrix.

Let’s look at the matrix of some invertible transformation T:

How does it act on a basis element? Well, let’s consider its action on e_1:

It just reads off the first column of the matrix of T. Similarly, T(e_i) will read off the ith column of the matrix of T. This works for any linear endomorphism of \mathbb{F}^n: its columns are the images of the standard basis vectors. But as we said last time, an invertible transformation must send a basis to another basis. So the columns of the matrix of T must form a basis for \mathbb{F}^n.

Checking that they’re a basis turns out to be made a little easier by the special case we’re in. The vector space has dimension n, and we’ve got n column vectors to consider. If all n are linearly independent, then the column rank of the matrix is n. Then the dimension of the image of T is n, and thus T is surjective.

On the other hand, any vector T(v) in the image of T is a linear combination of the columns of the matrix of T (use the components of v as coefficients). If these columns are linearly independent, then the only combination adding up to the zero vector has all coefficients equal to {0}. And so T(v)=0 implies v=0, and T is injective.

Thus we only need to check that the columns of the matrix of T are linearly independent to know that T is invertible.

Conversely, say we’re given a list of n linearly independent vectors f_i in \mathbb{F}^n. They must be a basis, since any linearly independent set can be completed to a basis, and a basis of \mathbb{F}^n must have exactly n elements, which we already have. Then we can use the f_i as the columns of a matrix. The corresponding transformation T has T(e_i)=f_i, and extends from there by linearity. It sends a basis to a basis, and so must be invertible.

The upshot is that we can consider this group as a group of n\times n matrices. They are exactly the ones so that the set of columns is linearly independent.

October 20, 2008 Posted by | Algebra, Group Examples, Linear Algebra | 1 Comment

Isomorphisms of Vector Spaces

Okay, after that long digression into power series and such, I’m coming back to linear algebra. What we want to talk about now is how two vector spaces can be isomorphic. Of course, this means that they are connected by an invertible linear transformation, (which preserves the addition and scalar multiplication operations):

T:V\rightarrow W

First off, to be invertible the kernel of T must be trivial. Otherwise we’d have two vectors in V mapping to the same vector in W, and we wouldn’t be able to tell which one it came from in order to invert the map. Similarly, the cokernel of T must be trivial, or we’d have missed some vectors in W, and we couldn’t tell where in V to send them under the inverse map. This tells us that the index of an isomorphism must be zero, and thus that the vector spaces must have the same dimension. This seems sort of obvious, that isomorphic vector spaces would have to have the same dimension, but you can’t be too careful.

Next we note that an isomorphism sends bases to bases. That is, if \{e_i\} is a basis for V, then the collection of f_i=T(e_i) will form a basis for W.

Since T is surjective, given any w\in W there is some v\in V with T(v)=w. But v=v^ie_i uniquely (remember the summation convention) because the e_i form a basis. Then w=T(v)=T(v^ie_i)=v^iT(e_i)=v^if_i, and so we have an expression of w as a linear combination of the f_i. The collection \{f_i\} thus spans W.

On the other hand, if we have a linear combination 0=x^if_i, then we can write 0=x^iT(e_i)=T(x^ie_i). Since T is injective we find x^ie_i=0, and thus each x^i=0, since the e_i form a basis. Thus the spanning set \{f_i\} is linearly independent, and thus forms a basis.

The converse, it turns out, is also true. If \{e_i\} is a basis of V, and \{f_i\} is a basis of W, then the map T defined by T(e_i)=f_i (and extending by linearity) is an isomorphism. Indeed, we can define an inverse straight away: T^{-1}(f_i)=e_i, and extend by linearity.

The upshot of these facts is that two vector spaces are isomorphic exactly when they have the same dimension. That is, just the same way that the cardinality of a set determines its isomorphism class in the category of sets, the dimension of a vector space determines its isomorphism class in the category of vector spaces.

Now let’s step back and consider what happens in any category and throw away all the morphisms that aren’t invertible. We’re left with a groupoid, and like any groupoid it falls apart into a bunch of “connected” pieces: the isomorphism classes. In this case, the isomorphism classes are given by the dimensions of the vector spaces.

Each of these connected pieces, then, is equivalent (as a groupoid) to the automorphism group of any of its objects, all of which such groups are isomorphic. In this case, we have a name for these automorphism groups.

Given any vector space V, all the interesting information about isomorphisms to or from this group can be summed up in the “general linear group” of V, which consists of all invertible linear maps from V to itself. We write this automorphism group as \mathrm{GL}(V).

We have a special name in the case when V is the vector space \mathbb{F}^n of n-tuples of elements of the base field \mathbb{F}. In this case we write the general linear group as \mathrm{GL}(n,\mathbb{F}) or as \mathrm{GL}_n(\mathbb{F}). Since every finite-dimensional vector space over \mathbb{F} is isomorphic to one of these (specifically, the one with n=\dim(V)), we have \mathrm{GL}(V)\cong\mathrm{GL}(n,\mathbb{F}). These particular general linear groups are thus extremely important for understanding isomorphisms of finite-dimensional vector spaces. We’ll investigate these groups as we move forward.

October 17, 2008 Posted by | Algebra, Linear Algebra | 11 Comments

Pi: A Wrap-Up

A couple months ago, in a post on World Series odds (how are those working out, Michael?), a commenter by the moniker of Kurt Osis asked a random question:

Ok now to my random question for the day. Is all human knowledge based on Pi? This just occurred to me the other day, if knowledge is based on measurement and the only objective form of measurement we have is the ratio between a circle’s circumference and diameter then is all knowledge really based on Pi?

Naturally, this sounds like just the sort of woo that I’ve decried in The Tao of Physics and The Dancing Wu Li Masters. It also smacks of “mathing up” the fuzzy ideas to give them the veneer of rigor and respectability. I’ve seen politicians do it, we’ve all seen poststructuralists do it, and there’s a lot of others that do too. And one of the very few undeniably mathy words that almost everyone knows is that blasted Greek constant \pi, so it gets called into service a lot.

Clearly, I had to nip this in the bud.

I pointed out that this idea of wrapping things up with “measurement” really gave away that this was nonsense. I cited that curvature of spacetime throws off exactly such measurements (a point I recently brought up with Todd, but he hadn’t thrown “measurement” out there himself). At that point Kurt backtracked and said that \pi was an idealization, and the measured discrepancies were knowledge. Of course I had forgotten about how slippery arguments can be with someone who only cares for the veneer of rigor.

Still I pressed onwards. I pointed out that \pi has nothing to do with any real, physical measurement. The Cabibbo angle, or the fine-structure constant — those are the real-world constants that are actually interesting because there is (as yet) no reason why they have to have the values that they do.

Then the discussion moved from an unrelated post on Michael’s weblog to an unrelated post on mine.

Again, Kurt advances the “epistemic \pi” hypothesis as if it’s remotely coherent. Now he asserts that he was “trying to think of something independent of the number system itself”, and finally I had something. Here I made my stand:

\pi is far from independent of the number system. It is what it is exactly because of the way the real number system is structured.

Then and there I decided to stop what I was working on about linear algebra. Instead, I set off on power series and how power series expansions can be used to express analytic functions. Then I showed how power series can be used to solve certain differential equations, which led us to defining the functions sine and cosine. Then I showed that the sine function must have a least positive zero, which we define to be \pi.

The assumptions that have led to the definition of \pi are just those of the real number system: we are working within the unique (up to isomorphism) largest archimedean field. There is no measurement, no knowledge, no science, and no epistemology to it. Kurt’s real question — the one he hops onto other mathematical weblogs’ unrelated comment threads to ask — is really about philosophy. He’s asking for a final answer to the entire field of epistemic research. It’s not forthcoming; not on a math weblog, not on a philosophy weblog, not anywhere. It’s been around in its current form for hundreds of years, and I don’t see a resolution on the horizon. But it certainly doesn’t lie in an accidental quirk of the real number system that society has for some reason decided to exalt far beyond its true value.

October 16, 2008 Posted by | rants | 3 Comments

Properties of the Sine and Cosine

Blaise got most of the classic properties of the sine and cosine in the comments to the last post, so I’ll crib generously from his work. As a note: I know many people write powers of the sine and cosine functions as \sin^2(x) (for example) instead of \sin(x)^2. As I tell my calculus students every year I refuse to do that myself because that should mean \sin(\sin(x)), and I guarantee people will get confused between \sin^{-1}(x)=\arcsin(x) or \sin^{-1}(x)=\frac{1}{\sin(x)}

First, let’s consider the function g(x)=\sin(x)^2+\cos(x)^2. We can take its derivative using the rules for derivatives of trigonometric functions from last time:

\displaystyle g'(x)=2\sin(x)\cos(x)-2\cos(x)\sin(x)=0

So this function is a constant. We easily check that g(0)=1, and so \sin(x)^2+\cos(x)^2=1.

What does this mean? It tells us that if \sin(x) and \cos(x) are the lengths of the legs of a right triangle, the hypotenuse will have length {1}. Alternately, the point with coordinates (\cos(x),\sin(x)) in the standard coordinate plane will lie on the unit circle. We haven’t talked yet about using integration to calculate the length of a path in the plane, but when we do we’ll see that the length of the arc on the circle from (1,0) to (\cos(x),\sin(x)) is exactly x.

This gives us another definition for the sine and cosine functions — one closer to the usual one people see in a trigonometry class. Given an input value x, walk that far around the unit circle, starting from the point (1,0). The coordinates of the point you end up at are the sine and cosine of x. And this gives us our “original” definitions: given a right triangle, it is similar to a right triangle whose hypotenuse has length {1}, and the sine and cosine are the lengths of the two legs.

Now, since \sin(x)^2 and \cos(x)^2 are both nonnegative, they must each be bounded above by g(x)=1. Thus -1\leq\sin(x)\leq1 and -1\leq\cos(x)\leq1. More specifically, any time that \sin(x_0)=0 we must have \cos(x_0)=\pm1.

We know that \sin(0)=0 and \cos(0)=1, so if we ever have another point t where \sin(t)=0 and \cos(t)=1 we have a period. This is because the differential equation will determine the future behavior of \sin(t+x) the same way it determined the behavior of \sin(0). In fact, if \sin(p)=0 and \cos(p)=-1, then the future behavior of \sin(p+x) will be exactly the negative of the behavior of \sin(x), and so eventually \sin(2p)=0 and \cos(2p)=1 again.

Admittedly, I’m sort of waving my hands here without an existence/uniqueness proof for solving differential equations. But the geometric intuition should suffice for the idea that since the function’s value and first derivative at {0} are enough to determine the function, then the specific point we know them at shouldn’t matter.

So, does the sine function have a positive zero? That is, is there some p>0 so that \sin(p)=0? If so, the lowest such one would have to have \cos(p)=-1 (because positive numbers near {0} have positive sines). The next one would then be \sin(2p)=0 with \cos(2p)=1, and the whole thing repeats with period 2p.

The function \sin(x) starts out increasing, and so \cos(x) decreases (since \cos(x)^2=1-\sin(x)^2. If \sin(x) has a maximum, then \cos(x) (its derivative) must cross zero. Then \sin(x) is decreasing, and it cannot increase again unless \cos(x) crosses zero again. But if \cos(x) crosses zero again it must have passed through a local extremum (Rolle) and so \sin(x) cannot increase again before it crosses zero itself.

So if we are to avoid \sin(x) having a positive zero, it must either increase to some asymptote below {1}, or it must increase to a maximum and then decrease to some asymptote below {1}. But for a function to have an asymptote it must approach a horizontal line, and its derivative must approach {0}. That is, we can only have \sin(x) approaching an asymptote at y=1, while \cos(x) approaches an asymptote at y=0.

But if \cos(x) approaches an asymptote, its derivative must also asymptotically approach {0}. But this derivative is -\sin(x), which we are assuming approaches -1! And so none of these asymptotes are possible!

So the sine function must have a positive zero: \sin(p)=0. And thus the sine and cosine (and all other solutions to this differential equation) will have period 2p.

Finally, what the heck is this value p? In point of fact, we have no way of telling. But it might come in handy, so we’ll define this number and give it a new name: \pi. Whenever we say \pi we’ll mean “the first positive zero of the sine function”.

Here I want to point out that I’ve fulfilled my boast of a few months ago on some other weblog. In my tireless rant against the \pi-fetishism that infests the geek community, I told someone that \pi can be derived, ultimately, from solely the properties of the real number system. Studying this field — itself uniquely specified on algebraic and topological grounds — leads us to both differential calculus and to power series, and from there to series solutions to differential equations. One of the most natural differential equations in the world thus gives rise to the trigonometric functions, and the definition \pi follows from their properties. There is no possible way it could be anything other than what it is when you see it from this side, while the geometric definition hinges on some very deep assumptions on the geometry of spacetime.

October 14, 2008 Posted by | Analysis, Calculus | 12 Comments


Get every new post delivered to your Inbox.

Join 366 other followers