# The Unapologetic Mathematician

## Transforming Differential Operators

Because of the chain rule and Cauchy’s invariant rule, we know that we can transform differentials along with functions. For example, if we write

\displaystyle\begin{aligned}x&=r\cos(\theta)\\y&=r\sin(\theta)\end{aligned}

we can write the differentials of $x$ and $y$ in terms of the differentials of $r$ and $\theta$:

\displaystyle\begin{aligned}dx&=\cos(\theta)dr-r\sin(\theta)d\theta\\dy&=\sin(\theta)dr+r\cos(\theta)d\theta\end{aligned}

It turns out that the chain rule also tells us how to rewrite differential operators in terms of the variables. But these go in the other direction. That is, we can write the differential operators $\frac{\partial}{\partial r}$ and $\frac{\partial}{\partial\theta}$ in terms of the operators $\frac{\partial}{\partial x}$ and $\frac{\partial}{\partial y}$.

First of all, let’s write down the differential of $f$ in terms of $x$ and $y$ and in terms of $r$ and $\theta$:

\displaystyle\begin{aligned}df&=\frac{\partial f}{\partial x}dx+\frac{\partial f}{\partial y}dy\\df&=\frac{\partial f}{\partial r}dr+\frac{\partial f}{\partial\theta}d\theta\end{aligned}

and now we can rewrite $dx$ and $dy$ in terms of $dr$ and $d\theta$.

\displaystyle\begin{aligned}df&=\frac{\partial f}{\partial x}\left(\cos(\theta)dr-r\sin(\theta)d\theta\right)+\frac{\partial f}{\partial y}\left(\sin(\theta)dr+r\cos(\theta)d\theta\right)\\&=\frac{\partial f}{\partial x}\cos(\theta)dr-\frac{\partial f}{\partial x}r\sin(\theta)d\theta+\frac{\partial f}{\partial y}\sin(\theta)dr+\frac{\partial f}{\partial y}r\cos(\theta)d\theta\\&=\left(\cos(\theta)\frac{\partial f}{\partial x}+\sin(\theta)\frac{\partial f}{\partial y}\right)dr+\left(-r\sin(\theta)\frac{\partial f}{\partial x}+r\cos(\theta)\frac{\partial f}{\partial y}\right)d\theta\end{aligned}

Now by uniqueness we can read off the partial derivatives of $f$ in terms of $r$ and $\theta$:

\displaystyle\begin{aligned}\frac{\partial f}{\partial r}&=\cos(\theta)\frac{\partial f}{\partial x}+\sin(\theta)\frac{\partial f}{\partial y}\\\frac{\partial f}{\partial\theta}&=-r\sin(\theta)\frac{\partial f}{\partial x}+r\cos(\theta)\frac{\partial f}{\partial y}\end{aligned}

Finally, we pull all mention of $f$ out of our notation and just write out the differential operators.

\displaystyle\begin{aligned}\frac{\partial}{\partial r}&=\cos(\theta)\frac{\partial}{\partial x}+\sin(\theta)\frac{\partial}{\partial y}\\\frac{\partial}{\partial\theta}&=-r\sin(\theta)\frac{\partial}{\partial x}+r\cos(\theta)\frac{\partial}{\partial y}\end{aligned}

Now we’re done rewriting, but for good form we should express these coefficients in terms of $x$ and $y$.

\displaystyle\begin{aligned}\frac{\partial}{\partial r}&=\frac{x}{\sqrt{x^2+y^2}}\frac{\partial}{\partial x}+\frac{y}{\sqrt{x^2+y^2}}\frac{\partial}{\partial y}\\\frac{\partial}{\partial\theta}&=-y\frac{\partial}{\partial x}+x\frac{\partial}{\partial y}\end{aligned}

It’s important to note that there’s really no difference between these last two steps. The first one uses the variables $r$ and $\theta$ while the second uses the variables $x$ and $y$, but they express the exact same functions, given the original substitutions above.

More generally, let’s say we have a vector-valued function $g:\mathbb{R}^m\rightarrow\mathbb{R}^n$ defining a substitution

\displaystyle\begin{aligned}y^1&=g^1(x^1,\dots,x^m)\\&\vdots\\y^n&=g^n(x^1,\dots,x^m)\end{aligned}

Cauchy’s invariant rule tells us that this gives rise to a substitution for differentials.

\displaystyle\begin{aligned}dy^1=dg^1(x^1,\dots,x^m)&=\frac{\partial g^1}{\partial x^1}dx^1+\dots+\frac{\partial g^1}{\partial x^m}dx^m=\frac{\partial g^1}{\partial x^i}dx^i\\&\vdots\\dy^n=dg^n(x^1,\dots,x^m)&=\frac{\partial g^n}{\partial x^1}dx^1+\dots+\frac{\partial g^n}{\partial x^m}dx^m=\frac{\partial g^n}{\partial x^i}dx^i\end{aligned}

We can play it a little loose and write this out in matrix notation:

$\displaystyle\begin{pmatrix}dy^1\\\vdots\\dy^n\end{pmatrix}=\begin{pmatrix}\frac{\partial g^1}{\partial x^1}&\dots&\frac{\partial g^1}{\partial x^m}\\\vdots&\ddots&\vdots\\\frac{\partial g^n}{\partial x^1}&\dots&\frac{\partial g^n}{\partial x^m}\end{pmatrix}\begin{pmatrix}dx^1\\\vdots\\dx^m\end{pmatrix}$

Now if we have a function $f$ in terms of the $y$ variables, we can use the substitution above to write it as a function of the $x$ variables. We can write the differential of $f$ in terms of each

\displaystyle\begin{aligned}df&=\frac{\partial f}{\partial y^j}dy^j\\df&=\frac{\partial f}{\partial x^i}dx^i\end{aligned}

Next we use the substitutions of the differentials to rewrite the first form as

$\displaystyle df=\frac{\partial f}{\partial y^j}\frac{\partial g^j}{\partial x^i}dx^i$

Then uniqueness allows us to match up the coefficients and write out the partial derivatives in terms of the $x$ variables

$\displaystyle\frac{\partial f}{\partial x^i}=\frac{\partial g^j}{\partial x^i}\frac{\partial f}{\partial y^j}$

It is in this form that the chain rule is most often introduced, or the similar form

$\displaystyle\frac{\partial f}{\partial x^i}=\frac{\partial y^j}{\partial x^i}\frac{\partial f}{\partial y^j}$

And now we can remove mention of $f$ from the formulæ and speak directly in terms of the operators

$\displaystyle\frac{\partial}{\partial x^i}=\frac{\partial y^j}{\partial x^i}\frac{\partial}{\partial y^j}$

Again, we can play it a little loose and write this in matrix notation

$\displaystyle\begin{pmatrix}\frac{\partial}{\partial x^1}\\\vdots\\\frac{\partial}{\partial x^m}\end{pmatrix}=\begin{pmatrix}\frac{\partial y^1}{\partial x^1}&\dots&\frac{\partial y^n}{\partial x^1}\\\vdots&\ddots&\vdots\\\frac{\partial y^1}{\partial x^m}&\dots&\frac{\partial y^n}{\partial x^m}\end{pmatrix}\begin{pmatrix}\frac{\partial}{\partial y^1}\\\vdots\\\frac{\partial}{\partial y^n}\end{pmatrix}$

This is very similar to the substitution for differentials written in matrix notation. The differences are that we transform from $y$-derivations to $x$-derivations instead of from $x$-differentials to $y$-differentials, and the two substitution matrices are the transposes of each other. Those who have been following closely (or who have some background in differential geometry) should start to see the importance of this latter fact, but for now we’ll consider this a statement about formulas and methods of calculation. We’ll come to the deeper geometric meaning when we come through again in a wider context.

October 12, 2009 - Posted by | Analysis, Calculus

1. In the book version of this blog, the co-author would then fill in the History of Mathematics in Differential Operators, covering Heaviside, Hilbert, von Neumann, and a huge cast of clever Mathematicians.

Comment by Jonathan Vos Post | October 13, 2009 | Reply

2. The book I’m thinking of wouldn’t come near this stuff.

Comment by John Armstrong | October 13, 2009 | Reply

3. […] We can also invert the transformation and rewrite differential operators: […]

Pingback by Higher-Order Differentials « The Unapologetic Mathematician | October 16, 2009 | Reply

4. […] (along with the induced transformation ) is a continuously differentiable function on with and . Notice that could extend out beyond , […]

Pingback by Change of Variables in Multiple Integrals I « The Unapologetic Mathematician | January 5, 2010 | Reply

5. I have found it very helpful. Thanks for the post.

Comment by Far Westerner | January 1, 2017 | Reply