In the course of showing that the differential of a function at a point — if it exists at all — is unique (and thus we can say “the” differential), we showed that given an orthonormal basis we have all partial derivatives. We even have all directional derivatives, with pretty much the same proof. We replace with an arbitrary vector , and pick the scalar so that . We find that . So when a function is differentiable not only do all directional derivatives exist, they’re all given by a single linear functional applied to the direction vector. Notice that this does not hold for the pathological example we used to show that having all directional derivatives didn’t imply continuity.
Okay, so now does having a differential at a point imply that a function is continuous there? Remember that this was the major reason we rejected both partial and directional derivatives as insufficient as generalizations of differentiation in one variable. But, happily, it does. Firstly, we’re going to pick a basis and show that the function satisfies a Lipschitz condition (five minutes of furtive laughter in any advanced calculus class starts…. now) (it’s worse than doing quantum mechanics with bras in front of high schoolers). That is to say, there is a positive number and some neighborhood of so that if but , then . Or, in more conceptual terms, any displacement near enough to can only be made times bigger after running it through . This gives us some control on what the function does as we move our input point around.
So, first we take in the definition of the differential, to find
we can add to both sides and use the triangle inequality to find
But once we pick a basis we can write out the differential as
I’ve written out the sum explicitly here because it’s necessary in the last term. So if we pick
then we have the Lipschitz condition we want.
And then it just so happens that a Lipschitz condition implies continuity. Indeed, given an pick a small enough that both , and also the ball of radius fits inside the neighborhood from the Lipschitz condition. Then for we find
and we have continuity.
Now to really understand this, go back and walk it through with a function of one variable. See if you can find where the old proof that a having a derivative implies continuity is sitting inside this Lipschitz condition proof.
Okay, for the moment let’s pick an orthonormal basis for our vector space . This gives us coordinates on the Euclidean space of points. It also gives us the dual basis of the dual space . This lets us write any linear functional as a unique linear combination . The component measures how much weight we give to the distance a vector extends in the direction.
Now if we look at a particular point we can put it into our differential and leave the second (vector) slot blank: . We will also write this simply as , and apply it to a vector by setting the vector just to its right: . Now is a linear functional, and we can regard as a function from our space of points to the dual of the space of displacements. We can thus write it out uniquely in components , where each is a function of the point , but not of the displacement .
We want to analyze these components. I assert that these are just the partial derivatives in terms of the orthonormal basis we’ve chosen: . In particular, I’m asserting that if the differential exists, then the partial derivatives exist as well.
By the definition of the differential, for every there is a so that if , then
Now we can write out in components
Next for a specific index we can pick for some value . Then , , and for all the other indices . Putting all these and the component representation of into the definition of the differential we find
Dividing through by we find
And this is exactly what we need to find that exists and equals .
Therefore if the function has a differential at the point , then it has all partial derivatives there, and these uniquely determine the differential at that point.
In light of our discussion of differentials, I want to make a point here that is usually glossed over in most treatments of multivariable calculus. In a very real sense, the sources and targets of our functions are not the vector spaces .
Let’s think about what we need to have a vector space. We need a way to add vectors and to multiply them by scalars. Geometrically, addition proceeds by placing vectors as arrows “tip-to-tail” and filling in the third side of the triangle. Scalar multiplication takes a vector as an arrow and stretches, shrinks, or reverses it depending on the value of the scalar. But both of these require us to think of a vector as an arrow which points from the origin to the point with coordinates given by the components of our vector.
But this makes the origin a very special point indeed. And why should we have any such special point, from a geometric perspective? We already insisted that we didn’t want to choose a basis for our space that would make some directions more special than others, so why should we have to choose a special point?
What really matters in our spaces is their topology. But we don’t want to forget all of the algebraic structure either. There are still some vestiges of the structure of a vector space that still make sense in the absence of an origin. Indeed, we can still talk about it as an affine space, where the idea of displacement vectors between points still makes sense. And these displacement vectors will be actual vectors in . Like any torsor, this means that our space “looks like” the group (here, vector space) we use to describe displacements, but we’ve “forgotten” which point was the origin. We call the result a “Euclidean” space, since such spaces provide nice models of the axioms of Euclidean geometry.
So let’s try to be a little explicit here: we actually have two different kinds of geometric objects floating around right now. First are the points in an -dimensional Euclidean space. We can’t add these points, or multiply them by scalars, but we can find a displacement vector between two of them. Such a displacement vector will be in the -dimensional real vector space . When it’s convenient to speak in terms of coordinates, we first pick an (arbitrary) origin point. Now if we’re sloppy we can identify a point in the Euclidean space with its displacement vector from the origin, and thus confound the Euclidean space of points and the vector space of displacements. We can proceed to choose a basis of our vector space of displacements, which gives coordinates to the Euclidean space of points; the point is the one whose displacement vector from the origin is .
Now, the rant. Some multivariable calculus books are careful about not doing nonsense things like “adding” or “scalar multiplying” points, but many do exactly these sorts of things, giving the impression to students that points are vectors. Even among the texts that are careful, I don’t recall seeing any that actually go so far to mention that a point is not a vector. When I teach the course I’m careful to point out that they’re not quite the same thing (though not in quite as much detail as this) and I go so far as to write them differently, with vector coordinates written out between angle brackets instead of parens. Without some sort of distinction being explicitly drawn between points and vectors, more students do fall into the belief that the two are the same thing, or (worse) that each is “the same thing as” a list of numbers in a coordinate representation. Within the context of a course on multivariable calculus, it’s possible to get by with these ideas, but in the long run they will have to be corrected before proceeding into more general contexts.
So, why bring this up now in particular? Because it explains the notation we use in the differential. When we write , the semicolon distinguishes between the point variable and the vector variable. It becomes even more apparent when we choose coordinates and write . Notice that we only ask that act linearly on the vector variable, since “linear transformations” are defined on vector spaces, not Euclidean spaces.
Okay, partial derivatives don’t work as an extension of derivation to higher-dimensional spaces. Even generalizing them to directional derivatives doesn’t give us what we want. What we need is not just the separate existence of a bunch of directional derivatives, but a single object which gives us all directional derivatives at once. To find it, let’s look back at the derivative in one dimension.
If we know the derivative of a function at a point , we can use it to build a close linear approximation to the function near that point. This is what we mean when we say that the derivative is the slope of the tangent line. It says that if we move away from by an amount , we can approximate the change in the function’s value
I’m going to write the part on the right-hand side as one function: . We use a semicolon here to distinguish the very different roles that and play. Before the semicolon we pick a point at which to approximate . After the semicolon we pick a displacement from our starting point. The “differential” approximates how much the function will change from its value at when we move away by a displacement . Importantly, for a fixed value of this displacement is a linear function of . This is obvious here, since once we pick , the value of is determined by multiplying by some real number, and multiplication of real numbers is linear.
What’s less obvious is also more important: the differential approximates the difference . By this, we can’t just mean that the distance between the two goes to zero as does. This is obvious, since both of them must themselves go to zero by linearity and continuity, respectively. No, we want them to agree more closely than that. We want something better than just continuity.
I say that if has a finite derivative at (so the differential exists), then for every there is a so that if we have
That is, not only does the difference get small (as the limit property would say), but it gets small even faster than does. And indeed this is the case. We can divide both sides by , which (since is small) magnifies the difference on the left side.
But if we can always find a neighborhood where this inequality holds, we have exactly the statement of the limit
which is exactly what it means for to be the derivative of at .
So for a single-variable function, having a derivative — the limit of a difference quotient — is equivalent to being differentiable — having a differential. And it’s differentials that generalize nicely.
Now let be a real-valued function defined on some open region in . The differential of at a point , if it exists, is a function satisfying the properties
- The function takes two variables in . The values are defined for every value of , and for some region of values containing the point under consideration. Typically, we’ll be looking for it to be defined in the same region .
- The differential is linear in the second variable. That is, given two vectors and in , and real scalars and , we must have
- The differential closely approximates the change in the value of as we move away from the point , in the sense that for every there is a so that if we have
I’m not making any sort of assertion about whether or not such a function exists, or under what conditions it exists. More subtly, I’m not yet making any assertion that if such a function exists it is unique. All I’m saying for the moment is that having this sort of linear approximation to the function near is the right generalization of the one-variable notion of differentiability.
Okay, now let’s generalize away from partial derivatives. The conceptual problem there was picking a bunch of specific directions as our basis, and restricting ourselves to that basis. So instead, let’s pick any direction at all, or even more generally than that.
Given a vector , we define the directional derivative of the function in the direction of by
It’s common to omit the brackets I’ve written in here, but that doesn’t make it as clear that we have a new function , and we’re asking for its value at . Instead, can suggest that we’re applying to the value . It’s also common to restrict to be a unit complex number, which is then used as a representative vector for all of those pointing in the same direction. I find that to be a needless hindrance, but others may disagree.
Anyhow, this looks a lot like our familiar derivative. Indeed, if we’re working in and we set we recover our regular derivative. And we have the same sort of interpretation: if we move a little bit in the direction of then we can approximate the change in
Now, does the existence of these limits guarantee the continuity of at ? No, not even the existence of all directional derivatives at a point assures us that the function will be continuous at that point. Indeed, we can consider another of our pathological cases
and patch it by defining . We take the directional derivative at using the direction vector
If then we find , while if we find . But we know that this function can’t be continuous, since if we approach the origin along the parabola we get a limit of instead of .
Again, the problem is that directional derivatives imply continuity along straight lines in various directions, but even continuity along every straight line through the point isn’t enough to assure continuity as a function of two variables, let alone more. We need something even stronger than directional derivatives.
On the other hand, directional derivatives are definitely stronger than partial derivatives. First of all, we haven’t had to make any choice of an orthonormal basis. But if we do have an orthonormal basis at hand, we find that partial derivatives are just particular directional derivatives
Incidentally, I’ve done two things here worth noting. First of all, I’ve gone back to using superscript indices for vector components. This allows the second thing, which is the transition from writing a function as taking one vector variable to rewriting the vector in terms of the basis at hand to writing the function as taking real variables . I know that some people don’t like superscript indices and the summation convention, but they’ll be standard when we get to more general spaces later, so we may as well get used to them now. Luckily, when we really understand something we shouldn’t have to pick coordinates, and indices only come into play when we do pick coordinates. Thus all the really meaningful statements shouldn’t have many indices to confuse us.
So yesterday we noted that the big conceptual problem with partial derivatives is that they’re highly dependent on a choice of basis. Before we generalize away from this, let’s note a few choices that we are going to make.
First of all, we’re going to assume our space comes with a positive-definite inner product, but it doesn’t really matter which one. We’re choosing a positive-definite form with signature instead of a form with some negative-definite or even degenerate portion — where we’d get s or s along the diagonal in an orthonormal basis — because we want every direction to behave the same as every other direction. More general signatures will come up when we talk about more general spaces. But we do want to be able to talk in terms of lengths and angles.
Now this doesn’t mean we’ve chosen a basis. We can choose one, but there’s a whole family of other equally valid choices related by orthogonal transformations. Ideally, we should define things which don’t depend on this choice at all. If we must make a choice in our definitions, the results should be independent of the choice. Often, this will amount to the existence of some action of the orthogonal group on our structure, and the invariance of the results under this action.
Definitions which don’t depend on the choice are related to what physicists mean when they say something is “manifestly coordinate-free”, since we don’t even have to mention coordinates to make our definitions. Those which depend on a choice, but are later shown to be independent of that choice are a lesser, but acceptable, alternative. Notice also that this avoidance of choices echoes the exact same motives when we preferred the language of linear transformations on vector spaces to the language of matrices acting on ordered tuples of numbers.
But, again, we have made a choice of some inner product. But this doesn’t matter, because all positive-definite inner products “look the same”, in the sense that if we pick an orthonormal basis for each of two distinct inner products, there’s going to be a general linear transformation which takes the one basis to the other, and which thus takes the one form to the other. That is, the forms are congruent. So as long as we have some inner product, any inner product, to talk about lengths and angles, and to translate between vectors and covectors, we’re fine.
Okay, we want to move towards some analogue of the derivative of a function that applies to functions of more than one variable. For the moment we’ll stick to single real outputs. As a goal, we want “differentiability” to be a refinement of the idea of smoothness started with “continuity“, so an important check is that it’s a stronger condition. That is, a differentiable function should be continuous.
For functions with a single real input we defined the derivative of the function at the point by the limit of the difference quotient
The problem here is that for vector inputs we can’t “divide” by the vector . So we need some other way around this problem.
Our first attempt may be familiar from calculus classes: we’ll just look at one variable at a time. That is, if we have a function of real variables and we keep all of them fixed except the th one, we can try to take the limit
That is, we fix down the values of all the other variables and get a function of the single remaining variable. We then take the single-variable derivative as normal.
The first problem here is that it having these partial derivatives — even having a partial derivative for each variable — doesn’t make a function continuous. Let’s look at the first pathological example of a limit we discussed:
If we consider the point , we can calculate both partial derivatives here. First we fix and find . Thus it’s easy to check that . Similarly, we can fix to find , and thus that . So both partial derivatives exist at , but the function doesn’t even have a limit there, much less one which equals its value.
The problem is the same one we saw in the case of multivariable limits: we can’t take a limit as one input point approaches another along a single path and just blithely expect that it’s going to mean anything. Here we’re just picking out two paths towards the same point and establishing that the function is continuous when we restrict to those paths, which doesn’t establish continuity in general.
There’s a deeper problem with partial derivatives, though. Implicit in the whole set-up is choosing a basis of our space. To write as a function of real variables instead of one -dimensional vector variable means picking a basis. In practice we often have no problem with this. Indeed, many problems come to us in terms of a collection of variables which we bind together to make a single vector variable. But in principle, anything with any geometric meaning should be independent of artificial choices of coordinates. We can’t even talk about partial derivatives without making such a choice, and so they clearly don’t get to the heart of any sensible notion of “differentiability”.
As we’ve seen, when our target is a higher-dimensional real space continuity is the same as continuity in each component. But what about when the source is such a space? It turns out that it’s not quite so simple.
One thing, at least, is unchanged. We can still say that is continuous at a point if . That is, if we have a sequence of points in (we only need to consider sequences because metric spaces are sequential) that converges to , then the image of this sequence converges to .
The problem is that limits themselves in higher-dimensional real spaces become a little hairy. In there’s really only two directions along which a sequence can converge to a given point. If we have a sequence converging from the right and another sequence converging from the left, that basically is enough to establish what the limit of the function is (and if it has one). In higher-dimensional spaces — even just in — we have so many possible approaches to any given point that in order to avoid an infinite amount of work we have to use something like the formal definition of limits in terms of metric balls. That is
The function has limit at the point if for every there is a so that implies .
We just consider the case with target since higher-dimensional targets are just like multiple copies of this same definition, just as we saw for continuity.
Now, let’s look at a few examples of limits to get an idea for why it’s not so simple. In each case, we will be considering a function which is bounded near (since just blowing up to infinity would be too easy to be really pathological) and even with nice limits along certain specified approaches, but which still fail to have a limit at the origin.
First off, let’s consider . If we consider approaching along the -axis with the sequence or we find a limit of . However, if we approach along the -axis with the sequence or we instead find a limit of . Thus no limit exists for the function.
Next let’s try . Now the approaches along either axis above all give the limit , so the limit of the function is , right? Wrong! This time if we approach along the diagonal with the sequence we get the limit . So we have to consider directions other than the coordinate axes.
What about ? Approaching along the coordinate axes we get a limit of . Approaching along any diagonal with the sequence the calculations are a bit hairier but we still find a limit of . So approaching from any direction we get the same limit, making the limit of the function , right? Wrong again! Now if we approach along the parabola with the sequence we find a limit of , and so the limit still doesn’t exist. By this point it should be clear that if straight lines aren’t enough to simplify things then there are just far too many curves to consider, and we need some other method to establish a limit, which is where the metric ball definition comes in.
Now I want to go off on a little bit of a rant here. It’s become fashionable to not teach the metric ball definition — – proofs, as they’re often called — at the first semester calculus level. It’s not even on the Calculus AB exam. I’m not sure when this happened because I was taught them first thing when I took calculus, and it wasn’t that long between then and my first experience teaching calculus. But it’d have to have been sometime in the mid-’90s. Anyway, they don’t even teach it in most college courses anymore. And for the purposes of calculus that’s okay, since as I mentioned above you can easily get away without them when dealing with single-variable functions. They can even survive the analogues of – proofs that come up when dealing with convergent sequences in second-semester calculus.
The problem comes when students get to third semester calculus and multivariable functions. Now, as we’ve just seen, there’s no sure way of establishing a limit. We can in some cases establish the continuity of simple functions (like coordinate projections) and then use limit laws to build up a larger class. But this approach fails for functions superficially similar to the pathological functions listed above, but which do have limits which can be established by an – proof. We can establish that certain limits do not exist by techniques similar to those above, but this requires some ingenuity in choosing two appropriate paths which give different results. There are one or two other methods that work in special cases, but nothing works like an – proof.
But now we can’t teach – proofs to these students! The method is rather more complicated when we’ve got more than one variable to work with, not least because of the more complicated distance formula to work with. What used to happen was that students would have developed some facility with – proofs back in first and second semester calculus, which could then be brought to bear on this new situation. But now they have no background and cannot, in general, absorb both the logical details of challenge-response – proofs and the complications of multiple variables at the same time. And so we show them a few jury-rigged tricks and assure them that within the rest of the course they won’t have to worry about it. I’d almost rather dispense with limits entirely than present this Frankenstein’s monstrosity.
And yet, I see no sign that the tide will ever turn back. The only hope is that the movement to make statistics the capstone high-school course will gain momentum. If we can finally wrest first-semester calculus from the hands of the public school system and put all calculus students at a given college through the same three-semester track, then the more intellectually rigorous institutions might have the integrity to put proper limits back into the hands of their first semester students and not have to worry about incoming freshmen with high AP scores covering for shoddy backgrounds.
Now that we have the topology of higher-dimensional real spaces in hand, we can discuss continuous functions between them. Since these are metric spaces we have our usual definition with and and all that:
A function is continuous at if and only if for each there is a so that implies .
where the bars denote the norm in one or the other of the spaces or as depends on context. Again, the idea is that if we pick a metric ball around , we can find some metric ball around whose image is contained in the first ball.
The reason why this works, of course, is that metric balls provide a neighborhood base for our topology. But remember that last time we came up with an equivalent topology on using a very different subbase: preimages of neighborhoods in under projections. Intersections of these pre-images furnish an alternative neighborhood base. Let’s see what happens if we write down the definition of continuity in these terms:
A function is continuous at if and only if for each with all there is a so that implies for all .
That is, if we pick a small enough metric ball around its image will fit within the “box” which extends in the th direction a distance on each side from the point .
At first blush, this might be a different notion of continuity, but it really isn’t. From what we did last yesterday we know that both the boxes and the balls provide equivalent topologies on the space , and so they much give equivalent notions of continuity. In a standard multivariable calculus course, we essentially reconstruct this using handwaving about how if we can fit the image of a ball into any box we can choose a box that fits into a selected metric ball, and vice versa.
But why do we care about this equivalent statement? Because now I can define a bunch of functions so that is the th component of . For each of these real-valued functions, I have a definition of continuity:
A function is continuous at if and only if for each there is a so that implies .
So each is continuous if I can pick a that works with a given . And if all the are continuous, I can pick the smallest of the and use it as a that works for each component. But then I can wrap the up into a vector and use the I’ve picked to satisfy the box definition of continuity for itself! Conversely, if is continuous by the box definition, then I must be able to use that the for a given vector to verify the continuity of each for the given .
The upshot is that a function from a metric space (generalize this yourself to other metric spaces than ) to is continuous if and only if each of the component functions is continuous.
As we move towards multivariable calculus, we’re going to primarily be concerned with the topological spaces (for various values of ) just as in calculus we were primarily concerned with the topological space . As a topological space, is just like the vector space we’ve been discussing, but now we care a lot less about the algebraic structure than we do about the notion of which points are “close to” other points.
And it turns out that is a metric space, so all of the special things we know about metric spaces can come into play. Indeed, inner products define norms and norms on vector spaces define metrics. We can even write it down explicitly. If we write our vectors and , then the distance is
Incidentally, this is the exact same formula we’d get if we started with the metric space and built up as the product of copies.
One thing I didn’t mention back when I put together products of metric spaces is that we get the same topology as if we’d forgotten the metric and taken the product of topological spaces. This will actually be useful to us, in a way, so I’d like to explain it here.
We define the topology on a metric space by using balls of radius around each point to provide a subbase for the topology. On the other hand, when we have a product space we use preimages of open sets under the canonical projections to provide a subbase. To show that these generate the same topology, what we’ll do is show that the identity map from as a product space to as a metric space is a homeomorphism. Since it’s obviously invertible, we just need to show that it’s continuous in both directions. And we can use our subbases to do just that.
What we have to show is that each set in one subbase is open in terms of the other subbase. That is, for each point in the set we should be able to come up with a finite intersection of sets in the other subbase that contains the point, and yet fits inside the set we started with.
Okay, so consider the preimage of an open set under the projection . That is, the collection of all with . Clearly since is open in the metric space we can pick a radius so that the open interval is contained in . But then the ball of radius in around the point contains the point, and is itself contained in , for if for some other point , then cannot possibly be within the ball of radius around .
On the other hand, let’s take a ball of radius about a point . We set and consider the open intervals . I say that the intersection of the preimages is contained in the ball. Indeed, if is in the intersection, the furthest any coordinate can be from the center is . Thus we can calculate the total distance
and so the whole intersection must be within the ball.
This approach is pretty straightforward to generalize to the case of any product of metric spaces, but I’ll leave that as an exercise.