The Unapologetic Mathematician

Cauchy’s Invariant Rule

An immediate corollary of the chain rule is another piece of “syntactic sugar”.

If we have functions $g:X\rightarrow\mathbb{R}^n$ and $f:Y\rightarrow\mathbb{R}^p$ for some open regions $X\subseteq\mathbb{R}^m$ and $Y\subseteq\mathbb{R}^n$ so that the image $g(X)$ is contained in $Y$, we can compose the two functions to get a new function $f\circ g:X\rightarrow\mathbb{R}^p$. In terms of formulas, we can choose coordinates $y^i$ on $\mathbb{R}^n$ and write out both the function $f(y^1,\dots,y^n)$ and the component functions $g^1(x),\dots,g^n(x)$. We get a formula for $\left[f\circ g\right](x)$ by substituting $g^i(x)$ for $y^i$ in the formula for $f$ and write $y^i=g^i(x)$.

The language there seems a little convoluted, so I’d like to give an example. We might define a function $f(x,y)=e^{x^2+y^2}$ for all points $(x,y)$ in the plane $\mathbb{R}^2$. This is all well and good, but we might want to talk about the function in polar coordinates. To this end, we may define $x=r\cos(\theta)$ and $y=r\sin(\theta)$. These are the component functions describing a transformation $g$ from the region $(r,\theta)\in(0,\infty)\times(-\pi,\pi)\subseteq\mathbb{R}^2$ to the region where $(x,y)\neq(0,0)$. We can substitute $r\cos(\theta)$ for $x$ and $r\sin(\theta)$ for $y$ in our formula for $f$ to get a new function $f\circ g$ with formula

$\displaystyle f(g(r,\theta))=e^{r^2\cos(\theta)^2+r^2\sin(\theta)^2}=e^{r^2}$

This much is straightforward. The thing is, now we want to take differentials. What Cauchy’s invariant rule tells us is that we can calculate the differential of $f\circ g$ by not only substituting $g^i(x)$ for $y^i$, but also substituting $dg^i(x;t)$ for $s^i$ in the formula for $df(y;s)$. That is, if $h=f\circ g$ then we have the equivalence

$\displaystyle dh(x;t)=df(g^1(x),\dots,g^n(x);dg^1(x;t),\dots,dg^n(x;t))$

In our particular example, we can easily calculate the differential of $f$ using our first formula:

$df(x,y)=2xe^{x^2+y^2}dx+2ye^{x^2+y^2}dy$

or using our second formula:

$df(r,\theta)=2re^{r^2}dr$

We want to call both of these simply $df$. But can we do so unambiguously? Indeed, if $x=r\cos(\theta)$ then we find

$\displaystyle dx=\cos(\theta)dr-r\sin(\theta)d\theta$

and if $y=r\sin(\theta)$ then we find

$\displaystyle dy=\sin(\theta)dr+r\cos(\theta)d\theta$

We substitute these into our formula for $df(x,y)$ to find

\displaystyle\begin{aligned}df(r,\theta)&=2r\cos(\theta)e^{r^2\cos(\theta)^2+r^2\sin(\theta)^2}\left(\cos(\theta)dr-r\sin(\theta)d\theta\right)+2r\sin(\theta)e^{r^2\cos(\theta)^2+r^2\sin(\theta)^2}\left(\sin(\theta)dr+r\cos(\theta)d\theta\right)\\&=2r\cos(\theta)e^{r^2}\left(\cos(\theta)dr-r\sin(\theta)d\theta\right)+2r\sin(\theta)e^{r^2}\left(\sin(\theta)dr+r\cos(\theta)d\theta\right)\\&=2r\cos(\theta)e^{r^2}\cos(\theta)dr+2r\sin(\theta)e^{r^2}\sin(\theta)dr-2r\cos(\theta)e^{r^2}r\sin(\theta)d\theta+2r\sin(\theta)e^{r^2}r\cos(\theta)d\theta\\&=\left(2r\cos(\theta)^2e^{r^2}+2r\sin^2(\theta)e^{r^2}\right)dr+\left(2r^2\cos(\theta)\sin(\theta)e^{r^2}-2r^2\cos(\theta)\sin(\theta)e^{r^2}\right)d\theta\\&=2re^{r^2}dr\end{aligned}

just the same as if we calculated directly from the formula in terms of $r$ and $\theta$.

That is, we can substitute our formulæ for the coordinate functions $y^i=g^i(x)$ before taking the differential in terms of $x$, or we can take the differential in terms of $y$ and then substitute our formulæ for the coordinate functions $y^i=g^i(x)$ and their differentials $dy^i=dg^i(x)$ into the result. Either way, we end up in the same place, so we don’t have to worry about ending up with two (or more!) “different” differentials of $f$.

So, how do we verify this using the chain rule? Just write out the differentials out using partial derivatives. For example, we know that

$\displaystyle df(y;s^1,\dots,s^n)=\frac{\partial f}{\partial y^i}\biggr\vert_ys^i$

and so on. So, performing our substitutions we can find:

\displaystyle\begin{aligned}df(g(x);dg^1(x;t),\dots,dg^n(x;t))&=\frac{\partial f}{\partial y^i}\biggr\vert_{y=g(x)}dg^i(x;t)\\&=\frac{\partial f}{\partial y^i}\biggr\vert_{y=g(x)}\frac{\partial g^i}{\partial x^j}\biggr\vert_xt^j\\&=\frac{\partial\left[f\circ g\right]}{\partial x^j}\biggr\vert_xt^j\\&=d\left[f\circ g\right](x;t)\end{aligned}

The important part here is the passage from products of two partial derivatives to single partial derivatives of $f\circ g$. This works out because when we consider differentials as linear transformations, the matrix entries are the partial derivatives. The composition of the linear transformations $df(g(x))$ and $dg(x)$ is given by the product of these matrices, and the entries of the resulting matrix must (by uniqueness) be the partial derivatives of the composite function.

October 8, 2009 - Posted by | Analysis, Calculus

1. […] Product and Quotient rules As I said before, there’s generally no product of higher-dimensional vectors, and so there’s no generalization of the product rule. But we can multiply and divide real-valued functions of more than one variable. Finding the differential of such a product or quotient function is a nice little exercise in using Cauchy’s invariant rule. […]

Pingback by Product and Quotient rules « The Unapologetic Mathematician | October 9, 2009 | Reply

2. […] Differential Operators Because of the chain rule and Cauchy’s invariant rule, we know that we can transform differentials along with functions. For example, if we […]

Pingback by Transforming Differential Operators « The Unapologetic Mathematician | October 12, 2009 | Reply

3. […] complicated than our first-order derivatives. In particular, they don’t obey anything like Cauchy’s invariant rule, meaning they don’t transform well when we compose functions. As an example, let’s go […]

Pingback by Higher-Order Differentials « The Unapologetic Mathematician | October 16, 2009 | Reply

4. […] first term here is the second differential in terms of the . If there were an analogue of Cauchy’s invariant rule, this would be all there is to the formula. But we’ve got another term — one due to the […]

Pingback by Higher Differentials and Composite Functions « The Unapologetic Mathematician | October 19, 2009 | Reply

5. […] with the tools from the last couple days, being careful about when we can and can’t trust Cauchy’s invariant rule, since the second differential can transform […]

Pingback by Extrema with Constraints I « The Unapologetic Mathematician | November 25, 2009 | Reply