The Unapologetic Mathematician

Mathematics for the interested outsider

The Chain Rule

Since the components of the differential are given by partial derivatives, and partial derivatives (like all single-variable derivatives) are linear, it’s straightforward to see that the differential operator is linear as well. That is, if f:\mathbb{R}^m\rightarrow\mathbb{R}^n and g:\mathbb{R}^m\rightarrow\mathbb{R}^n are two functions, both of which are differentiable at a point x, and a and b are real constants, then the linear combination af+bg is also differentiable at x, and the differential is given by

\displaystyle d(af+bg)=adf+bdg

There’s not usually a product for function values in \mathbb{R}^n, so there’s not usually any analogue of the product rule and definitely none of the quotient rule, so we can ignore those for now.

But we do have a higher-dimensional analogue for the chain rule. If we have a function g:X\rightarrow\mathbb{R}^n defined on some open region X\subseteq\mathbb{R}^m and another function f:Y\rightarrow\mathbb{R}^p defined on a region Y\subseteq\mathbb{R}^n that contains the image g(X), then we can compose them to get a single function f\circ g:X\rightarrow\mathbb{R}^p defined by \left[f\circ g\right](x)=f(g(x)). And if g is differentiable at a point x and f is differentiable at the image point g(x), then the composite function is differentiable at x.

First of all, what should the differential be? Remember that the differential dg(x) is a linear transformation that takes displacements t\in\mathbb{R}^m from the point x and turns them into displacements dg(x)t\in\mathbb{R}^n from the point g(x). Then the differential df(y) is a linear transformation that takes displacements s\in\mathbb{R}^n from the point y and turns them into displacements df(y)s\in\mathbb{R}^p from the point f(y). Putting these together, we have a composite linear transformation df(g(x))dg(x) that will start with a linear transformation that takes displacements t\in\mathbb{R}^m from the point x and turns them into displacements df(g(x))dg(x)t from the point f(g(x)). I assert that this is composite transformation is exactly the differential of the composite function.

Just as a sanity check, what happens when we look at single-variable real-valued functions? In this case, df(y) and dg(x) are both linear transformations from one-dimensional spaces to other one-dimensional spaces. That is, they’re represented as 1\times1 matrices that just multiply by the single real entry. So the composite of the two transformations is given by the matrix whose single entry is the product of the two matrices’ single entries. In other words, in one variable the differentials looks like single real numbers f'(y) and g'(x), and their composite is given by multiplication: f'(g(x))g'(x). This is exactly the one-variable chain rule. To understand multiple variables we have to move from products of real numbers to compositions of linear transformations, which will be products of real matrices.

Okay, so let’s verify that d\left[f\circ g\right](x)=df(g(x))dg(x) does indeed act as a differential for f\circ g. It’s clearly a linear transformation between the appropriate two spaces of displacements. What we need to verify is that it gives a good approximation. That is, for every \epsilon>0 there is a \delta>0 so that if \delta>\lVert t\rVert>0 we have

\displaystyle\left\lVert\left[f(g(x+t))-f(g(x))\right]-df(g(x))dg(x)t\right\rVert<\epsilon\lVert t\rVert

First of all, since f is differentiable at g(x), given \tilde{\epsilon}>0 there is a \tilde{\delta} so that if \tilde{\delta}>\lVert s\rVert>0 we have

\displaystyle\left\lVert\left[f(g(x)+s)-f(g(x))\right]-df(g(x))s\right\rVert<\tilde{\epsilon}\lVert s\rVert

Now since g is differentiable it satisfies a Lipschitz condition. We showed that this works for real-valued functions, but extending the result is very straightforward. That is, there is some radius r_1 and a constant M>0 so that if r_1>t>0 we have the inequality \lVert g(x+t)-g(x)\rVert<M\lVert t\rVert. That is, g cannot stretch displacements by more than a factor of M as long as the displacements are small enough.

Now r_1 may be smaller than \frac{\tilde{\delta}}{M} already, but just in case let’s shrink it until it is. Then we know that

\displaystyle\lVert g(x+t)-g(x)\rVert<M\lVert t\rVert<Mr_1<M\frac{\tilde{\delta}}{M}=\tilde{\delta}.

so we can use this difference as a displacement s from g(x). We find

\displaystyle\begin{aligned}\left\lVert\left[f(g(x+t))-f(g(x))\right]-df(g(x))\left(g(x+t)-g(x)\right)\right\rVert&<\tilde{\epsilon}\lVert g(x+t)-g(x)\rVert\\&<\tilde{\epsilon}M\lVert t\rVert\end{aligned}

Now we’re going to find a constant N\geq0 and a radius r_2 so that

\displaystyle\left\lVert df(g(x))\left(g(x+t)-g(x)\right)-df(g(x))dg(x)t\right\rVert\leq\tilde{\epsilon}nN\lVert t\rVert

whenever r_2>\lVert t\rVert>0. Once this is established, we are done. Given an \epsilon>0 we can set \tilde{\epsilon}=\frac{\epsilon}{M+nN} and let \delta be the smaller of the two resulting radii r_1 and r_2. Within this smaller radius, the desired inequality will hold.

To get this result, we choose orthonormal coordinates on the space \mathbb{R}^n. We can then use these coordinates to write

\displaystyle df(g(x))\left(\left[g(x+t)-g(x)\right]-dg(x)t\right)=\left[D_if\right](g(x))\left(\left[g^i(x+t)-g^i(x)\right]-dg^i(x)t\right)

But since each of the several g^i is differentiable we can pick our radius r_2 so that all of the inequalities

\displaystyle\left\lvert\left[g^i(x+t)-g^i(x)\right]-dg^i(x)t\right\rvert<\tilde{\epsilon}\lVert t\rVert

hold for r_2>\lVert t\rVert>0. Then we let N be the magnitude of the largest of the component partial derivatives \left\lVert\left[D_if\right](g(x))\right\rVert, and we’re done.

Thus when g is differentiable at x and f is differentiable at g(x), then the composite f\circ g is differentiable at x, and the differential of the composite function is given by

\displaystyle d\left[f\circ g\right](x)=df(g(x))dg(x)

the composite of the differentials, considered as linear transformations.

October 7, 2009 - Posted by | Analysis, Calculus

20 Comments »

  1. Another crisp, fine exposition. My wife is still annoyed by a time over 20 years ago when we went to a Pasadena restaurant with a younger alumnus friend of mine, whom I’d guided into Mathematical Music Theory (where he got his double degree at UC Santa Cruz and published in IEEE and other venues later), and played in a rock band together. he had a girlfriend with him who, once she found that my wife was a professor, kept interrupting to demand an explanation of the Chain Rule. My wife has taught that, but this was a social event, not a colloquium. Where was I? Oh, right, nicely done! I still think that you should find a coauthor to make this into a textbook.

    Comment by Jonathan Vos Post | October 7, 2009 | Reply

    • The textbook idea has merit, but wouldn’t it have to be rather large? Also the blog format has the advantage that people can ask questions and get answers. Btw John: has anyone come thru with the syntactic reasoners article. If not, send me your email addie and I can provide.

      Comment by Avery Andrews | October 8, 2009 | Reply

      • Charles sent along a copy, thanks.

        As for the book, I’m more inclined towards a lay-audience book, at least at first. I’ve thought about the subject, but again I need to find someone with a bit more of a background in the history and philosophy of mathematics to help keep me honest.

        I’ll be more comfortable searching for someone once I get more settled in terms of an actual job. Unemployment isn’t conducive to the peace of mind I’d need to get that ball rolling.

        Comment by John Armstrong | October 8, 2009 | Reply

        • Will your lay-audience book include the category theory you’ve discussed here? I think that could be one of the distinguishing features– there exist plenty of calculus and linear algebra books, but not many lay-audience books talking about category theory.

          Actually, a nowhere-dense introduction to category theory by itself would, I think, make the book worthwhile.

          Comment by Akhil Mathew | October 10, 2009 | Reply

          • Probably not. As I’ve mentioned before, the one I’m thinking of seeks to answer the more philosophical question, “What Is A Number?”

            Comment by John Armstrong | October 10, 2009 | Reply

            • Well but it seems to me that categorifying the integers is part of the answer to that. & if a book is ‘nowhere dense’, it is likely to be so heavy that just looking at it will make your wrists hurt

              Comment by Avery Andrews | October 11, 2009 | Reply

              • I mean more like an expansion on the standard content in the first couple days of an advanced calculus class, from Peano axioms through the continuum.

                Comment by John Armstrong | October 11, 2009 | Reply

            • I sometime fantasize about math ebooks that would initially give you a fairly crisp statement of what is true and why, but would then expand to more discursive text when you clicked on a ‘please explain’ icon. Perhaps also some kind of rating/reward system to encourage people to try to figure out as much as possible for themselves.

              Comment by Avery Andrews | October 11, 2009 | Reply

  2. A while ago you were nice enough to give props out to my podcast Combinations and Permutations and I am now back(I have been back a few times to read of course but this is my first comment) to let you know that I have a 2nd podcast where I am interviewing mathematicians. The first episode where I interview Gary Chartrand is now up at Strongly Connected Components. Of course I would like you to listen but much more I would love to now have you as a guest on my new podcast. If this is interesting to you please email me at sccmathpodcast@gmail.com

    Comment by Combinations and Permutations | October 8, 2009 | Reply

  3. […] Invariant Rule An immediate corollary of the chain rule is another piece of “syntactic […]

    Pingback by Cauchy’s Invariant Rule « The Unapologetic Mathematician | October 8, 2009 | Reply

  4. Re: philosophical question, “What Is A Number?”

    Excerpt from GENE515 (by Jonathan Vos Post)
    This partly answers a question asked of me when I was
    21 years old, by my doctoral thesis advisor Oliver G.
    Selfridge [Father of Machine Perception]. When I say “Man” I am echoing and older
    text, and not excluding Woman.

    Excerpt from GENE515

    What is Man, that he may know Number? What is Number
    that it may be known by Man?

    As we are mathematicians, we are in the image of our
    creator, The Mathematician, who has other attributes
    beyond our comprehension, and is Transfinite.

    He freely gives us this world, and the cosmos beyond,
    and the flora and fauna over which to be stewards, and
    our fellow human beings to love, which is in the image
    of His love, which is transfinite.

    We have free will, and for those of use who choose to
    be mathematicians, he gives us the integers as toys,
    in which is His book coded.

    We play with those toys, some of us in solitude, some
    of us playing together. And when we put aside childish
    things, behold, we still have the gift of Number, and
    they are more than first we knew.

    Eureka!, and Aha!, and knowing what Mozart meant when
    he said that he did not write music, but it was
    already there and he plucked it from thin air as it
    blew past. And what Ramanujan said was given him by a
    Goddess, And what Gauss could see as a child, and
    Riemann in the looking glass of Primes, and Galois by
    candlelight in the brief hours before his fatal duel.

    Euclid, alone, has looked on beauty bare. But we
    mathematicians today are not alone, far from it,
    cradled in the same Web woven of Number, binary and
    octal and hex, decimal and alphanumeric, vector and
    raster, and more in cables, trunks, and as wifi in the
    very air about us.

    By knowing Number more deeply, we more deeply know
    ourselves, and our Creator.

    Every word begins and ends with the empty word; the
    empty word begins and ends with itself.

    Comment by Jonathan Vos Post | October 10, 2009 | Reply

  5. […] Differential Operators Because of the chain rule and Cauchy’s invariant rule, we know that we can transform differentials along with […]

    Pingback by Transforming Differential Operators « The Unapologetic Mathematician | October 12, 2009 | Reply

  6. […] with the function . This function is clearly differentiable, with constant derivative . And so the chain rule tells us […]

    Pingback by Taylor’s Theorem « The Unapologetic Mathematician | October 20, 2009 | Reply

  7. […] like we said when discussing the chain rule, the differential at the point defines a linear transformation from the -dimensional space of […]

    Pingback by The Jacobian « The Unapologetic Mathematician | November 11, 2009 | Reply

  8. […] differentiable functions on two open regions and in , with , and let be their composite. Then the chain rule tells us […]

    Pingback by The Jacobian of a Composition « The Unapologetic Mathematician | November 12, 2009 | Reply

  9. […] derivative is identically zero as well. But since the are composite functions we can also use the chain rule to evaluate these partial derivatives. We […]

    Pingback by Extrema with Constraints II « The Unapologetic Mathematician | November 27, 2009 | Reply

  10. […] Then we can use the chain rule: […]

    Pingback by Differentiating Partial Integrals « The Unapologetic Mathematician | January 14, 2010 | Reply

  11. […] the multivariable chain rule. We […]

    Pingback by Coordinate Vectors Span Tangent Spaces « The Unapologetic Mathematician | March 31, 2011 | Reply

  12. […] in the ball the whole segment for is contained within the ball. We define a function and use the chain rule to […]

    Pingback by Continuously Differentiable Functions are Locally Lipschitz « The Unapologetic Mathematician | May 4, 2011 | Reply

  13. […] that , , and . The chain rule lets us then […]

    Pingback by What Does the Bracket Measure? (part 2) « The Unapologetic Mathematician | June 27, 2011 | Reply


Leave a reply to Differentiating Partial Integrals « The Unapologetic Mathematician Cancel reply