The Unapologetic Mathematician

Mathematics for the interested outsider

The Chain Rule

Today we get another rule for manipulating derivatives. Along the way we’ll see another way of viewing the definition of the derivative which will come in handy in the future.

Okay, we defined the derivative of the function f at the point x as the limit of the difference quotient:
\displaystyle f'(x)=\lim\limits_{\Delta x\rightarrow0}\frac{f(x+\Delta x)-f(x)}{\Delta x}
The point of the derivative-as-limit-of-difference-quotient is that if we adjust our input by \Delta x, we adjust our output “to first order” by f'(x)\Delta x. That is, the the change in output is roughly the change in input times the derivative, and we have a good idea of how to control the error:
\displaystyle\left(f(x+\Delta x)-f(x)\right)-f'(x)\Delta x=\epsilon(\Delta x)\Delta x
where \epsilon is a function of \Delta x satisfying \lim\limits_{\Delta x\rightarrow0}\epsilon(\Delta x)=0. This means the difference between the actual change in output and the change predicted by the derivative not only goes to zero as we look closer and closer to x, but it goes to zero fast enough that we can divide it by \Delta x and still it goes to zero. (Does that make sense?)

Okay, so now we can use this viewpoint on the derivative to look at what happens when we follow one function by another. We want to consider the composite function f\circ g at the point x_0 where f is differentiable. We’re also going to assume that g is differentiable at the point f(x_0). The differentiability of f at x_0 tells us that
\displaystyle\left(f(x_0+\Delta x)-f(x_0)\right)=f'(x_0)\Delta x+\epsilon(\Delta x)\Delta x
and the differentiability of g at y_0 tells us that
\displaystyle\left(g(y_0+\Delta y)-g(y_0)\right)=g'(y_0)\Delta y+\eta(\Delta y)\Delta y
where \lim\limits_{\Delta x\rightarrow0}\epsilon(\Delta(x)=0, and similarly for \eta. Now when we compose the functions f and g we set y_0=f(x_0), and \Delta y is exactly the value described in the first line! That is,
\displaystyle \left[f\circ g\right](x_0+\Delta x)-\left[f\circ g\right](x_0)=g(f(x_0)+f'(x_0)\Delta x+\epsilon(\Delta x))-g(f(x_0))=
\displaystyle g'(f(x_0))\left(f'(x_0)\Delta x+\epsilon(\Delta x)\Delta x\right)+\eta(\Delta y)\left(f'(x_0)\Delta x+\epsilon(\Delta x)\Delta x\right)=
\displaystyle g'(f(x_0))f'(x_0)\Delta x+\left(g'(f(x_0))\epsilon(\Delta x)+\eta(\Delta y)\left(f'(x_0)+\epsilon(\Delta x)\right)\right)\Delta x

The last quantity in parentheses which we multiply by \Delta x goes to zero as \Delta x does. First, \epsilon(\Delta x) does by assumption. Then as \Delta x goes to zero, so does \Delta y, since f must be continuous. Thus \eta(\Delta y) must go to zero, and the whole quantity is then zero in the limit. This establishes that not only is f\circ g differentiable at x_0, but that its derivative there is
\displaystyle\left[f\circ g\right]'(x_0)=\frac{d}{dx}g(f(x))\bigg|_{x=x_0}=g'(f(x_0))f'(x_0)
This means that since “to first order” we get the change in the output of f by multiplying the change in its input by f'(x_0), and “to first order” we get the change in the output of g by multiplying the change in its input by g'(y_0), we get the change in the output of their composite by multiplying first by f'(x_0) and then by g'(y_0)=g'(f(x_0)).

Another way we often write the chain rule is by setting y=f(x) and z=g(y). Then the derivative f'(x) is written \frac{dy}{dx}, while g'(y) is written \frac{dz}{dy}. The chain rule then says:
\displaystyle \frac{dz}{dx}=\frac{dz}{dy}\frac{dy}{dx}
This is nice since it looks like we’re multiplying fractions. The drawback is that we have to remember in our heads where to evaluate each derivative.

Now we can take this rule and use it to find the derivative of the inverse of an invertible function f. More specifically, if a function f is one-to-one in some neighborhood of a point x_0, we can find another function f^{-1} whose domain is the set of values f takes — the range of f — and so that f(f^{-1}(x))=x=f^{-1}(f(x)). Then if the function is differentiable at x_0 and the derivative f'(x_0) is not zero, the inverse function will be differentiable, with a derivative we will calculate.

First we set y=f(x) and x=f^{-1}(y). Then we take the derivative of the defining equation of the inverse to get \frac{df^{-1}}{dy}\frac{df}{dx}=1, which we could write even more suggestively as \frac{dx}{dy}\frac{dy}{dx}=1. That is, the derivative of the composition inverse of our function is the multiplicative inverse of the derivative. But as we noted above, we have to remember where to evaluate everything. So let’s do it again in the other notation.

Since f^{-1}(f(x))=x, we differentiate to find \left[f^{-1}\right]'(f(x))f'(x)=1. Then we substitute x=f^{-1}(y) and juggle some algebra to write
\displaystyle\left[f^{-1}\right]'(y)=\frac{1}{f'(f^{-1}(y))}

December 27, 2007 Posted by | Analysis, Calculus | 43 Comments