## The Geometric Meaning of the Derivative

Now we know what the derivative of a function is, and we have some tools to help us calculate them. But what does the derivative *mean*. Here’s a picture:

In green I’ve drawn a function defined on (at least) the interval of real numbers between and . The specifics of the function don’t matter. In fact having a formula around to fall back on would be detrimental to understanding what’s going on.

In red I’ve drawn the line with equation . This describes a function with two very important properties. First, when we get , so the two functions take the same value there. Second, the derivative everywhere, and in particular . That is, not only do both graphs pass through the same point above , they’re pointing in the same direction. As they pass through the point, the line “touches” the graph of , and we call it the “tangent” line after the Latin *tangere*: “to touch”.

So the derivative seems to describe the direction of the tangent line to the graph of at the point . Indeed, if we change our input by adding , the tangent line predicts a change in output of . Remember, it’s this simple relation between changes in input and changes in output that makes lines lines. But the graph of the function is not its tangent line, and the function is not the same as the function defined by . How do they differ?

Well, we can subtract them. At , we get a difference of because of how we define the function , so let’s push away to the point . There we find a difference of . But we saw this already in the lead-up to the chain rule! This is the function , where . That is, not only does the difference go to zero — the line and the graph pass through the same point — but it goes fast enough that the difference divided by still goes to zero — the line and the graph point in the same direction.

Let’s try to understand why the tangent line works like this. It’s pretty difficult to draw a tangent line, except in some simple geometric circumstances. So how can we get ahold of it? Well instead of trying to draw a line that touches the graph at that point, let’s imagine drawing one that cuts through at , and also at the nearby point . We’ll call it the “secant” line after the Latin *secare*: “to cut”. Now along this line we changed our input by and changed our output by . That is, the relationship between inputs and outputs along this secant line is just the difference quotient !

We know that the derivative is the limit of the difference quotient as goes to . In the same way, the tangent line is the limit of the secant lines as we pick our second point closer and closer to — as long as our function is well-behaved. It might happen that the secants don’t approach any one tangent line, in which case our function is not differentiable at that point. In fact, that’s exactly what it means for a function to fail to be differentiable.

So in terms of the graph of a function, the derivative of a function at a point describes the tangent line to the graph of the function through that point. In particular, it gives us the “slope” — the constant relationship between inputs and outputs along the line.

## The Chain Rule

Today we get another rule for manipulating derivatives. Along the way we’ll see another way of viewing the definition of the derivative which will come in handy in the future.

Okay, we defined the derivative of the function at the point as the limit of the difference quotient:

The point of the derivative-as-limit-of-difference-quotient is that if we adjust our input by , we adjust our output “to first order” by . That is, the the change in output is roughly the change in input times the derivative, and we have a good idea of how to control the error:

where is a function of satisfying . This means the difference between the actual change in output and the change predicted by the derivative not only goes to zero as we look closer and closer to , but it goes to zero fast enough that we can divide it by and *still* it goes to zero. (Does that make sense?)

Okay, so now we can use this viewpoint on the derivative to look at what happens when we follow one function by another. We want to consider the composite function at the point where is differentiable. We’re also going to assume that is differentiable at the point . The differentiability of at tells us that

and the differentiability of at tells us that

where , and similarly for . Now when we compose the functions and we set , and is exactly the value described in the first line! That is,

The last quantity in parentheses which we multiply by goes to zero as does. First, does by assumption. Then as goes to zero, so does , since must be continuous. Thus must go to zero, and the whole quantity is then zero in the limit. This establishes that not only is differentiable at , but that its derivative there is

This means that since “to first order” we get the change in the output of by multiplying the change in its input by , and “to first order” we get the change in the output of by multiplying the change in *its* input by , we get the change in the output of their composite by multiplying first by and then by .

Another way we often write the chain rule is by setting and . Then the derivative is written , while is written . The chain rule then says:

This is nice since it looks like we’re multiplying fractions. The drawback is that we have to remember in our heads where to evaluate each derivative.

Now we can take this rule and use it to find the derivative of the inverse of an invertible function . More specifically, if a function is one-to-one in some neighborhood of a point , we can find another function whose domain is the set of values takes — the range of — and so that . Then if the function is differentiable at and the derivative is not zero, the inverse function will be differentiable, with a derivative we will calculate.

First we set and . Then we take the derivative of the defining equation of the inverse to get , which we could write even more suggestively as . That is, the derivative of the composition inverse of our function is the multiplicative inverse of the derivative. But as we noted above, we have to remember where to evaluate everything. So let’s do it again in the other notation.

Since , we differentiate to find . Then we substitute and juggle some algebra to write

## Algebraic Laws of Differentiation

Just like we had the laws of limits we have a collection of rules to help us calculate derivatives. Let’s start with the most basic functions.

As we said while defining the derivative, any linear function has the derivative at each point. We’ll separate this out into two rules:

That is, the derivative of any constant function is the constant function , and the derivative of the identity function is the constant function .

The next two rules should be perfectly straightforward to establish, so we’ll skip their proofs:

That is, the derivative of the sum of two functions is the sum of their derivatives, and the derivative of a constant multiple of a function is the same constant multiple of its derivative. In particular, we can use these rules along with the basic pieces above to recalculate the derivative .

Multiplication is a bit tougher. We might hope to simply split derivatives along products like we do for limits, and even the inventors of the calculus originally thought this would work. But look what happens for . If we split the derivative of this function along its product we get . But we know that this function doesn’t always go up at this constant rate. In fact, for negative values of , the function actually goes *down*. So this rule doesn’t work.

Let’s go back to the definition of the derivative as the limit of a difference quotient:

Now the trick we’ll use to evaluate this limit is to add and subtract to the numerator here. That is, in effect we’re adding zero and leaving it alone, but the formula will be easier to work with. In particular, we can start splitting it up using the laws of limits.

Of these four limits, the fourth is the limit of a continuous function because doesn’t depend on . The second and third are just the definitions of and . The first limit goes to because, as we showed, all differentiable functions are continuous. And so we have the rule

In general, we can take the derivative of the product of a bunch of functions by taking the derivative of each one and multiplying by the other functions, then adding up all the results. As a special case we get the “power rule”:

If is differentiable at , but doesn’t take the value there, then its reciprocal will also be differentiable. We want to calculate its derivative. We could try evaluating the limit of the difference quotient again, but instead we will proceed as follows. Define the reciprocal to be . Then we have the equation . Taking the derivative of both sides at and using the product rule we find . We can now solve this to find , or:

Combining this with the product rule we find the “quotient rule”:

Now we have all the tools in hand to take the derivative of any “rational” function. That is, a function of the form , where and are polynomials. We can take the derivative of any polynomial by using the power rule, constant multiple rule, and addition rule. Then we can take the derivative of with the quotient rule.

A little anecdote about the quotient rule: often it’s written in the form . My first calculus teacher wrote this formula on the board, then noted that one must remember which order the top comes in or you’ll get the wrong sign. “V-D-U-D”. At this point he fake-tiptoed his way to the door, closed it gently, and in a loud stage whisper said, “Venereal Disease is an Ugly Disease”. To this day I can’t teach the quotient rule, and can barely use it myself, without remembering that line.

## Derivatives

Okay, so we’ve got one of our real-valued functions defined on some domain : . Let’s start analyzing it!

We start with some point , and we can crank out the value the function takes at that point: . What we want to understand is how the value of the function changes as we change . More specifically, we want to understand how it changes as we vary our input *continuously*. Of course, “continuous” means we’re just moving around a little bit in some neighborhood of the point we started with, and neighborhoods in basically come down to open intervals. So let’s just assume that our domain is some open interval containing the point we’re looking at. If it contains an open interval already we can just restrict it, and if it doesn’t contain a neighborhood of our point then we can’t vary the input continuously, so we aren’t interested in that case.

The simplest sort of function is just a constant . In this case, the value *doesn’t* change. That’s what it means to be constant! A little more complex is a linear function for real numbers and . Then if we move our point over a bit by adding an amount to it our function takes the value

That is, adding to our input adds the constant multiple to our output. It’s easy to understand how this sort of function changes as we change the input. We can characterize this behavior by calling the change in the output , and considering the constant .

Now, let’s consider an arbitrary continuous function. We can still tweak our input by adding to it, and now we get a new output . Subtracting off we get the change in the output: . This won’t in general be a constant like it was for the linear functions above: if we pick different values for we may get different values for . But we can still ask how the changes in the input and output are related by calculating the “difference quotient” . This gives us a function of the amount by which we changed our input.

Let’s look back at the difference quotient for a linear function: . But it’s not really the constant function ! There’s a hole in the function at , which we can patch by taking the limit . Since the difference quotient is everywhere around the hole, the limit exists and equals .

There’s also a hole at in all our difference quotient functions, and we’d love to patch them up by taking a limit just like we did above. But can we always do this? Look at the function near . For positive inputs the function just gives the input back again, for negative inputs it gives back the negative of the input, and at zero it gives back zero again. So let’s look at . When is positive this is , where is negative this is , and of course there’s a hole at . But now we see that there’s no limit as approaches zero, since the image of a sequence approaching from the left converges to , while the image of one approaching from the right converges to . Since they don’t agree, we can’t unambiguously patch the hole.

On the other hand, maybe we *can* patch the hole by taking a limit. If we can, then we say that is “differentiable” at , and the limit of the difference quotient is called the “derivative” of at . We write this as

Another notation for the derivative that shows up is . This hints at the fact that as we change the point we started with we may get different values for the derivative. That is, the derivative is a new function! In analogy with continuity, we say that a function is differentiable on a region if it is differentiable — if the difference quotient has a limit — for each point . The linear functions we considered above are differentiable everywhere in , with for all . On the other hand, the absolute value function is continuous everywhere, but differentiable only where . In this case, the derivative is the constant when is positive and the constant when is negative.

It’s worth pointing out that if a function is differentiable at a point then it must be continuous there. Indeed, if is to have any chance at converging, we must have , and this just asserts that the limit of at is its value there. So differentiability implies continuity, but continuity doesn’t imply differentiability, as we saw from the absolute value above.

## Laws of Limits

Okay, we know how to define the limit of a function at a point in the closure of its domain. But we don’t always want to invoke the whole machinery of all sequences converging to that point or that of neighborhoods with the – definition. Luckily, we have some shortcuts.

First off, we know that the constant function and the identity function are continuous and defined everywhere, so we immediately see that and . Those are the basic functions we defined. We also defined some ways of putting functions together, and we’ll have a rule for each one telling us how to build limits for more complicated functions from limits for simpler ones.

We can multiply a function by a constant real number. If we have then we find . Let’s say we’re given an error bound . Then we can consider , and use the assumption about the limit of to find a so that implies that . This, in turn, implies that , and so the assertion is proved.

Similarly, we can add functions. If and , then we find . Here we start with an and find and so that implies for . Then if we set to be the smaller of and , we see that implies .

From these two we can see that the process of taking a limit at a point is linear. In particular, we also see that by combining the two rules above. Similarly we can show that , which I’ll leave to you to verify as we did the rule for addition above.

Another way to combine functions that I haven’t mentioned yet is composition. Let’s say we have functions and . Then we can pick out those points so that and call this collection . Then we can apply the second function to get , defined by . Our limit rule here is that if is continuous at , then . That is, we can pull limits past continuous functions. This is just a reflection of the fact that continuous functions are exactly those which preserve limits of sequences. In particular, a continuous function equals its own limit wherever it’s defined: .

As an application of this fact, we can check that is continuous for all nonzero . Then the limit rule tells us that as long as , then . Combining this with the rule for multiplication we see that as long as the limit of at is nonzero then .

Another thing that limits play well with is the order on the real numbers. If on their common domain then as long as both limits exist. Indeed, since both limits exist we can take any sequence converging to . The image sequence under is always above the image sequence under , and so the limits of the sequences are in the same order. Notice that we really just need to hold on some neighborhood of , since we can then restrict to that neighborhood.

Similarly if we have three functions latex g(x)$ and with on a common domain containing a neighborhood of , and if , then the limit of at exists and is also equal to . Given any sequence converging to , our hypothesis tells us that . Given any neighborhood of , and are both within the neighborhood for sufficiently large , and then so will be in the neighborhood. Thus the image of the sequence under is “squeezed” between the images under and , and converges to as well.

These rules for limits suffice to calculate almost all the limits that we care about without having to mess around with the raw definitions. In fact, many calculus classes these days only skim the definition if they mention it at all. We can more or less get away with this while we’re only dealing with a single real variable, but later on the full power of the definition comes in handy.

There’s one more situation I should be a little more explicit about. If we are given a function on some domain and we want to find its limit at a border point (which includes the case of a single-point hole in the domain) and we can extend the function to a continuous function on a larger domain which contains a neighborhood of the point in question, then . Indeed, given any sequence converging to we have (since they agree on ), and the limit of is just its value at . This extends what we did before to handle the case of at , and similar situations will come up over and over in the future.

## Limits of Functions

Okay, we know what it is for a net to have a limit, and then we used that to define continuity in terms of nets. Continuity just says that the function’s value is exactly what it takes to preserve convergence of nets.

But what if we have a bunch of nets and no function value? Like, if there’s a hole in our domain — as there is at for the function — we certainly shouldn’t penalize this function just on a technicality of how we presented it. Well there may be a hole in the domain, but we still have sequences in the domain that converge to where that hole is. So let’s take a domain , a function , and a point . In particular, we’re interested in what happens when is in the closure of , but not in itself.

Now we look at all sequences which converge to . There’s at least one of them because , but there may be quite a few. Each one of these sequences has an opinion on what the value of should be at . If they all agree, then we can define the limit of the function where is any one of these sequences. In the case of we see that at every point other than our function takes the value . Thus on any sequence converging to (but never taking ) the function gives the constant sequence . Since they all agree, we can define the limit .

If a function has a limit at a hole in its domain, we can use that limit to patch up the hole. That is, if our point is in the closure of but not in itself, and if our function has a limit at , then we can extend our function to by setting . Just like we by default set the domain of a function to be wherever it makes sense, we will just assume that the domain has been extended to whatever boundary points the function takes a limit at.

On the other hand, we can also describe limits in terms of neighborhoods instead of sequences. Here we end up with formulas that look like those we saw when we defined continuity in metric spaces. A function has a limit at the point if for every there is a so that implies . Going back and forth from this definition to the one in terms of sequences behaves just the same as going back and forth between net and neighborhood definitions of continuity.

To a certain extent we’re starting to see a little more clearly the distinct feels of the two different approaches. Using nets tells us about approaching a point in various systematic ways, and having a limit at a point tells us that we can understand the function at that point by understanding any system along which we can approach it. We can even replace the limiting point by the convergent net and say that the net *is* the point, as we did when first defining the real numbers. Using neighborhoods, on the other hand, feels more like giving error tolerances. A limit is the value the function is trying to get to, and if we’re willing to live with being wrong by , there’s a way to pick a for how wrong our input can be and still come at least that close to the target.

## Movie news

I just heard this:

- MGM and New Line will co-finance and co-distribute two films,
The Hobbitand a sequel toThe Hobbit. New Line will distribute in North America and MGM will distribute internationally.- Peter Jackson and Fran Walsh will serve as Executive Producers of two films based on
The Hobbit. New Line will manage the production of the films, which will be shot simultaneously.- Peter Jackson and New Line have settled all litigation relating to the
Lord of the RingsTrilogy.

So great. We’re going to finally have a Hobbit movie. And a seq…what?

Look, I love Tolkien as much as the next guy, and in an intellectual (opp. fantasy fanboy) way. I grew up with it, I’ve dabbled in Quenya and Sindarin, I’ve read the archives of *Vinyar Tengwar*, and I wasn’t horribly disappointed by the *LotR* trilogy. But honestly, people, there’s *just not that much there* in the Hobbit. It won’t really support two movies on its own. So either they’re reeeeeeeeeally stretching the script to squeeze the money out; they’re bringing in a lot of stuff from *Unfinished Tales*, or maybe even *HoME* (unlikely given how much of *LotR* proper was cut); or they’re creating new Hobbit material out of whole cloth.

A friend of mine says that he trusts Jackson’s vision. I trusted Lucas’ vision, and see where that got us. This may be good, but buckle up just in case.

## Real-Valued Functions of a Single Real Variable

At long last we can really start getting into one of the most basic kinds of functions: those which take a real number in and spit a real number out. Quite a lot of mathematics is based on a good understanding of how to take these functions apart and put them together in different ways — to *analyze* them. And so we have the topic of “real analysis”. At our disposal we have a toolbox with various methods for calculating and dealing with these sorts of functions, which we call “calculus”. Really, all calculus is is a collection of techniques for understanding what makes these functions tick.

Sitting behind everything else, we have the real number system — the unique ordered topological field which is big enough to contain limits of all Cauchy sequences (so it’s a complete uniform space) and least upper bounds for all nonempty subsets which have any upper bounds at all (so the order is Dedekind complete), and yet small enough to exclude infinitesimals and infinites (so it’s Archimedean).

Because the properties that make the real numbers do their thing are all wrapped up in the topology, it’s no surprise that we’re really interested in continuous functions, and we have quite a lot of them. At the most basic, the constant function for all real numbers is continuous, as is the identity function .

We also have ways of combining continuous functions, many of which are essentially inherited from the field structure on . We can add and multiply functions just by adding and multiplying their values, and we can multiply a function by a real number too.

Since all the nice properties of these algebraic constructions carry over from , this makes the collection of continuous functions into an algebra over the field of real numbers. We get additive inverses as usual in a module by multiplying by , so we have an -module using addition and scalar multiplication. We have a bilinear multiplication because of the distributive law holding in the ring where our functions take their values. We also have a unit for multiplication — the constant function — and a commutative law for multiplication. I’ll leave you to verify that all these operations give back continuous functions when we start with continuous functions.

What we don’t have is division. Multiplicative inverses are tough because we can’t invert any function which takes the value zero anywhere. Even the identity function is very much not continuous at . In fact, it’s not even defined there! So how can we deal with this?

Well, the answer is sitting right there. The function is not continuous *at that point*. We have two definitions (by neighborhood systems and by nets) of what it means for a function between two topological spaces to be continuous at one point or another, and we said a function is continuous if it’s continuous at every point in its domain. So we can throw out some points and restrict our attention to a subspace where the function *is* continuous. Here, for instance, we can define a function by , and this function is continuous at each point in its domain.

So what we should really be considering is this: for each subspace we have a collection of those real-valued functions which are continuous on . Each of these is a commutative -algebra, just like we saw for the collection of functions continuous on all of .

But we may come up with two functions over different domains that we want to work with. How do we deal with them together? Well, let’s say we have a function and another one , where . We may not be able to work with at the points in that aren’t in , but we can certainly work with at just those points of that happen to be in . That is, we can *restrict* the function to the function . It’s the exact same function, except it’s only defined on instead of all of . This gives us a homomorphism of -algebras . (If you’ve been reading along for a while, how would a category theorist say this?)

As an example, we have the identity function in and the reciprocal function in . We can restrict the identity function by forgetting that it has a value at to get another function , which we will *also* denote by . Then we can multiply to get the function . Notice that the resulting function we get is *not* the constant function on because it’s not defined at .

Now as far as language goes, we usually drop all mention of domains and assume by default that the domain is “wherever the function makes sense”. That is, whenever we see we automatically restrict to nonzero real numbers, and whenever we combine two functions on different domains we automatically restrict to the intersection of their domains, all without explicit comment.

We do have to be a bit careful here, though, because when we see , we *also* restrict to nonzero real numbers. This is not the constant function because as it stands it’s not defined for . Clearly, this is a little nutty and pedantic, so tomorrow we’ll come back and see how to cope with it.

## The Orbit Method

Over at *Not Even Wrong*, there’s a discussion of David Vogan’s talks at Columbia about the “orbit method” or “orbit philosophy”. This is the view that there is — or at least there *should* be — a correspondence between unitary irreps of a Lie group and the orbits of a certain action of . As Woit puts it

This is described as a “method” or “philosophy” rather than a theorem because it doesn’t always work, and remains poorly understood in some cases, while at the same time having shown itself to be a powerful source of inspiration in representation theory.

What he *doesn’t* say in so many words (but which I’m just rude enough to) is that the same statement applies to a lot of theoretical physics. Path integrals are, as they currently stand, *prima facie* nonsense. In some cases we’ve figured out how to make sense of them, and to give real meaning to the conceptual framework of what *should* happen. And this isn’t a bad thing. Path integrals have proven to be a powerful source of inspiration, and a lot of actual, solid mathematics and physics has come out of trying to determine what the hell they’re supposed to mean.

Where this becomes a problem is when people take the conceptual framework as literal truth rather than as the inspirational jumping-off point it properly is.

## Archimedean Groups and the Largest Archimedean Field

Okay, I’d promised to get back to the fact that the real numbers form the “largest” Archimedean field. More precisely, any Archimedean field is order-isomorphic to a subfield of .

There’s an interesting side note here. I was thinking about this and couldn’t quite see my way forward. So I started asking around Tulane’s math department and seeing if anyone knew. Someone pointed me towards Mike Mislove, and when I asked him, *he* suggested we ask Laszlo Fuchs around the corner from him. Dr. Fuchs, it turned out, did know the answer, and it was in a book he’d written himself: *Partially Ordered Algebraic Systems*. It’s an interesting little volume, which I may come back and mine later for more topics.

Anyhow, we’ll do this a little more generally. First let’s talk about Archimedean ordered groups a bit. In a totally-ordered group we’ll say two elements and are “Archimedean equivalent” () if there are natural numbers and so that and (here I’m using the absolute value that comes with any totally-ordered group). That is, neither one is infinitesimal with respect to the other. This can be shown to be an equivalence relation, so it chops the elements of into equivalence classes. There are always at least two in any nontrivial group because the identity element is infinitesimal with respect to everything else. We say a group is Archimedean if there are *only* two Archimedean equivalence classes. That is, for any and other than the identity, there is a natural number with .

Now we have a theorem of HÃ¶lder which says that any Archimedean group is order-isomorphic to a subgroup of the real numbers with addition. In particular, we will see that any Archimedean group is commutative.

Now either has a least positive element or it doesn’t. If it does, then implies that ( is the identity of the group). By the Archimedean property, any element has an integer so that . Then we can multiply by to find that , so . Every element is thus some power of , and the group is isomorphic to the integers .

On the other hand, what if given a positive we can always find a positive with ? In this case, may be greater than , but in this case we can show that , and itself is less than , so in either case we have an element with and .

Now if two positive elements and fail to commute then without loss of generality we can assume . Then we pick and choose a to go with this . By the Archimedean property we’ll have numbers and with and . Thus we find that , which contradicts how we picked . And thus is commutative.

So we can pick some positive element and just set . Now we need to find where to send every other element. To do this, note that for any and any rational number we’ll either have or , and both of these situations must arise by the Archimedean property. This separates the rational numbers into two nonempty collections — a cut! So we define to be the real number specified by this cut. It’s straightforward now to show that , and thus establish the order isomorphism.

So all Archimedean groups are just subgroups of with addition as its operation. In fact, homomorphisms of such groups are just as simple.

Say that we have a nontrivial Archimedean group , a (possibly trivial) Archimedean group , and a homomorphism . If for some positive then this is just the trivial homomorphism sending everything to zero, since for any positive there is a natural number so that . In this case the homomorphism is “multiply by “.

On the other hand, take any two positive elements and consider the quotients (in ) and . If they’re different (say, ) then we can pick a rational number between them. Then , while , which contradicts the order-preserving property of the isomorphism! Thus we find the ratio must be a constant , and the homomorphism is “multiply by “.

Now let’s move up to Archimedean rings, whose definition is the same as that for Archimedean fields. In this case, either the product of any two elements is (we have a “zero ring”) and the additive group is order-isomorphic to a subgroup of , or the ring is order-isomorphic to a subring of . If we have a zero ring, then the only data left is an Archimedean group, which the above discussion handles, so we’ll just assume that we have some nonzero product and show that we have an order-isomorphism with a subring of .

So we’ve got some Archimedean ring and its additive group . By the theorem above, is order-isomorphic to a subgroup of . We also know that for any positive the operation (the dot will denote the product in ) is an order-homomorphism from to itself. Thus there is some non-negative real number so that . If we define then the assignment gives us an order-homomorphism from to some group .

Again, we must have for some non-negative real number . If then all multiplications in would give zero, so , and so the assignment is invertible. Now we see that . Similarly, we have , and so the function is an order-isomorphism of rings.

In particular, a field can’t be a zero ring, and so there must be an injective order-homomorphism . In fact, there can be only one, for if there were more than one the images would be related by multiplication by some positive : . But then , and so .

We can sum this up by saying that the real numbers are a terminal object in the category of Archimedean fields.