# The Unapologetic Mathematician

## Cauchy’s Invariant Rule

An immediate corollary of the chain rule is another piece of “syntactic sugar”.

If we have functions $g:X\rightarrow\mathbb{R}^n$ and $f:Y\rightarrow\mathbb{R}^p$ for some open regions $X\subseteq\mathbb{R}^m$ and $Y\subseteq\mathbb{R}^n$ so that the image $g(X)$ is contained in $Y$, we can compose the two functions to get a new function $f\circ g:X\rightarrow\mathbb{R}^p$. In terms of formulas, we can choose coordinates $y^i$ on $\mathbb{R}^n$ and write out both the function $f(y^1,\dots,y^n)$ and the component functions $g^1(x),\dots,g^n(x)$. We get a formula for $\left[f\circ g\right](x)$ by substituting $g^i(x)$ for $y^i$ in the formula for $f$ and write $y^i=g^i(x)$.

The language there seems a little convoluted, so I’d like to give an example. We might define a function $f(x,y)=e^{x^2+y^2}$ for all points $(x,y)$ in the plane $\mathbb{R}^2$. This is all well and good, but we might want to talk about the function in polar coordinates. To this end, we may define $x=r\cos(\theta)$ and $y=r\sin(\theta)$. These are the component functions describing a transformation $g$ from the region $(r,\theta)\in(0,\infty)\times(-\pi,\pi)\subseteq\mathbb{R}^2$ to the region where $(x,y)\neq(0,0)$. We can substitute $r\cos(\theta)$ for $x$ and $r\sin(\theta)$ for $y$ in our formula for $f$ to get a new function $f\circ g$ with formula

$\displaystyle f(g(r,\theta))=e^{r^2\cos(\theta)^2+r^2\sin(\theta)^2}=e^{r^2}$

This much is straightforward. The thing is, now we want to take differentials. What Cauchy’s invariant rule tells us is that we can calculate the differential of $f\circ g$ by not only substituting $g^i(x)$ for $y^i$, but also substituting $dg^i(x;t)$ for $s^i$ in the formula for $df(y;s)$. That is, if $h=f\circ g$ then we have the equivalence

$\displaystyle dh(x;t)=df(g^1(x),\dots,g^n(x);dg^1(x;t),\dots,dg^n(x;t))$

In our particular example, we can easily calculate the differential of $f$ using our first formula:

$df(x,y)=2xe^{x^2+y^2}dx+2ye^{x^2+y^2}dy$

or using our second formula:

$df(r,\theta)=2re^{r^2}dr$

We want to call both of these simply $df$. But can we do so unambiguously? Indeed, if $x=r\cos(\theta)$ then we find

$\displaystyle dx=\cos(\theta)dr-r\sin(\theta)d\theta$

and if $y=r\sin(\theta)$ then we find

$\displaystyle dy=\sin(\theta)dr+r\cos(\theta)d\theta$

We substitute these into our formula for $df(x,y)$ to find

\displaystyle\begin{aligned}df(r,\theta)&=2r\cos(\theta)e^{r^2\cos(\theta)^2+r^2\sin(\theta)^2}\left(\cos(\theta)dr-r\sin(\theta)d\theta\right)+2r\sin(\theta)e^{r^2\cos(\theta)^2+r^2\sin(\theta)^2}\left(\sin(\theta)dr+r\cos(\theta)d\theta\right)\\&=2r\cos(\theta)e^{r^2}\left(\cos(\theta)dr-r\sin(\theta)d\theta\right)+2r\sin(\theta)e^{r^2}\left(\sin(\theta)dr+r\cos(\theta)d\theta\right)\\&=2r\cos(\theta)e^{r^2}\cos(\theta)dr+2r\sin(\theta)e^{r^2}\sin(\theta)dr-2r\cos(\theta)e^{r^2}r\sin(\theta)d\theta+2r\sin(\theta)e^{r^2}r\cos(\theta)d\theta\\&=\left(2r\cos(\theta)^2e^{r^2}+2r\sin^2(\theta)e^{r^2}\right)dr+\left(2r^2\cos(\theta)\sin(\theta)e^{r^2}-2r^2\cos(\theta)\sin(\theta)e^{r^2}\right)d\theta\\&=2re^{r^2}dr\end{aligned}

just the same as if we calculated directly from the formula in terms of $r$ and $\theta$.

That is, we can substitute our formulæ for the coordinate functions $y^i=g^i(x)$ before taking the differential in terms of $x$, or we can take the differential in terms of $y$ and then substitute our formulæ for the coordinate functions $y^i=g^i(x)$ and their differentials $dy^i=dg^i(x)$ into the result. Either way, we end up in the same place, so we don’t have to worry about ending up with two (or more!) “different” differentials of $f$.

So, how do we verify this using the chain rule? Just write out the differentials out using partial derivatives. For example, we know that

$\displaystyle df(y;s^1,\dots,s^n)=\frac{\partial f}{\partial y^i}\biggr\vert_ys^i$

and so on. So, performing our substitutions we can find:

\displaystyle\begin{aligned}df(g(x);dg^1(x;t),\dots,dg^n(x;t))&=\frac{\partial f}{\partial y^i}\biggr\vert_{y=g(x)}dg^i(x;t)\\&=\frac{\partial f}{\partial y^i}\biggr\vert_{y=g(x)}\frac{\partial g^i}{\partial x^j}\biggr\vert_xt^j\\&=\frac{\partial\left[f\circ g\right]}{\partial x^j}\biggr\vert_xt^j\\&=d\left[f\circ g\right](x;t)\end{aligned}

The important part here is the passage from products of two partial derivatives to single partial derivatives of $f\circ g$. This works out because when we consider differentials as linear transformations, the matrix entries are the partial derivatives. The composition of the linear transformations $df(g(x))$ and $dg(x)$ is given by the product of these matrices, and the entries of the resulting matrix must (by uniqueness) be the partial derivatives of the composite function.

About these ads

October 8, 2009 - Posted by | Analysis, Calculus

## 5 Comments »

1. [...] Product and Quotient rules As I said before, there’s generally no product of higher-dimensional vectors, and so there’s no generalization of the product rule. But we can multiply and divide real-valued functions of more than one variable. Finding the differential of such a product or quotient function is a nice little exercise in using Cauchy’s invariant rule. [...]

Pingback by Product and Quotient rules « The Unapologetic Mathematician | October 9, 2009 | Reply

2. [...] Differential Operators Because of the chain rule and Cauchy’s invariant rule, we know that we can transform differentials along with functions. For example, if we [...]

Pingback by Transforming Differential Operators « The Unapologetic Mathematician | October 12, 2009 | Reply

3. [...] complicated than our first-order derivatives. In particular, they don’t obey anything like Cauchy’s invariant rule, meaning they don’t transform well when we compose functions. As an example, let’s go [...]

Pingback by Higher-Order Differentials « The Unapologetic Mathematician | October 16, 2009 | Reply

4. [...] first term here is the second differential in terms of the . If there were an analogue of Cauchy’s invariant rule, this would be all there is to the formula. But we’ve got another term — one due to the [...]

Pingback by Higher Differentials and Composite Functions « The Unapologetic Mathematician | October 19, 2009 | Reply

5. [...] with the tools from the last couple days, being careful about when we can and can’t trust Cauchy’s invariant rule, since the second differential can transform [...]

Pingback by Extrema with Constraints I « The Unapologetic Mathematician | November 25, 2009 | Reply