The Chain Rule
Today we get another rule for manipulating derivatives. Along the way we’ll see another way of viewing the definition of the derivative which will come in handy in the future.
Okay, we defined the derivative of the function at the point
as the limit of the difference quotient:
The point of the derivative-as-limit-of-difference-quotient is that if we adjust our input by , we adjust our output “to first order” by
. That is, the the change in output is roughly the change in input times the derivative, and we have a good idea of how to control the error:
where is a function of
satisfying
. This means the difference between the actual change in output and the change predicted by the derivative not only goes to zero as we look closer and closer to
, but it goes to zero fast enough that we can divide it by
and still it goes to zero. (Does that make sense?)
Okay, so now we can use this viewpoint on the derivative to look at what happens when we follow one function by another. We want to consider the composite function at the point
where
is differentiable. We’re also going to assume that
is differentiable at the point
. The differentiability of
at
tells us that
and the differentiability of at
tells us that
where , and similarly for
. Now when we compose the functions
and
we set
, and
is exactly the value described in the first line! That is,
The last quantity in parentheses which we multiply by goes to zero as
does. First,
does by assumption. Then as
goes to zero, so does
, since
must be continuous. Thus
must go to zero, and the whole quantity is then zero in the limit. This establishes that not only is
differentiable at
, but that its derivative there is
This means that since “to first order” we get the change in the output of by multiplying the change in its input by
, and “to first order” we get the change in the output of
by multiplying the change in its input by
, we get the change in the output of their composite by multiplying first by
and then by
.
Another way we often write the chain rule is by setting and
. Then the derivative
is written
, while
is written
. The chain rule then says:
This is nice since it looks like we’re multiplying fractions. The drawback is that we have to remember in our heads where to evaluate each derivative.
Now we can take this rule and use it to find the derivative of the inverse of an invertible function . More specifically, if a function
is one-to-one in some neighborhood of a point
, we can find another function
whose domain is the set of values
takes — the range of
— and so that
. Then if the function is differentiable at
and the derivative
is not zero, the inverse function will be differentiable, with a derivative we will calculate.
First we set and
. Then we take the derivative of the defining equation of the inverse to get
, which we could write even more suggestively as
. That is, the derivative of the composition inverse of our function is the multiplicative inverse of the derivative. But as we noted above, we have to remember where to evaluate everything. So let’s do it again in the other notation.
Since , we differentiate to find
. Then we substitute
and juggle some algebra to write