The Unapologetic Mathematician

Mathematics for the interested outsider

The Gradient Vector

The most common first approach to differential calculus in more than one variable starts by defining partial derivatives and directional derivatives, as we did. But instead of defining the differential, it simply collects the partial derivatives together as the components of a vector called the gradient of f, and written \nabla f.

We showed that these partial derivatives are the components of the differential (when it exists), and so there should be some connection between the two concepts. And indeed there is.

As a bilinear form, our inner product defines an isomorphism from the space of displacements to its dual space. This isomorphism sends the basis vector e_i to the dual basis vector dx^i, since we can check both

\displaystyle dx^i(v)=dx^i(v^je_j)=v^j\delta_j^i=v^i


\displaystyle\langle e_i,v\rangle=\langle e_i,v^je_j\rangle=v^j\delta_j^i=v^i

That is, the linear functional dx^i is the same as the linear functional \langle e_i,\underline{\hphantom{x}}\rangle.

So under this isomorphism the differential df=\frac{\partial f}{\partial x^i}dx^i corresponds to the vector

\displaystyle\nabla f=\begin{pmatrix}\frac{\partial f}{\partial x^1}\\\vdots\\\frac{\partial f}{\partial x^n}\end{pmatrix}

We can remove the function from this notation to write the operator \nabla on its own as

\displaystyle\nabla=\begin{pmatrix}\frac{\partial}{\partial x^1}\\\vdots\\\frac{\partial}{\partial x^n}\end{pmatrix}

We also write the gradient vector at a given point as \nabla f(x), where we have to remember to parse this as evaluating a function \nabla f at the point x rather than as applying the operator \nabla to the value f(x).

Now, under our approach the differential is more fundamental and more useful than the gradient vector. However, there is some meaning to the geometric interpretation of the gradient as a displacement vector.

First of all, let’s ask that u be a unit vector. Then we can calculate the directional derivative \left[D_uf\right](x)=df(x;u). But the linear functional given by the differential df(x) is the same as the linear functional \langle\nabla f(x),\underline{\hphantom{x}}\rangle. Thus we also find that \left[D_uf\right](x)=\langle\nabla f(x),u\rangle. And we can interpret this inner product in terms of the length of \nabla f and the angle \theta between \nabla f and u:

\left[D_uf\right](x)=\lVert\nabla f(x)\rVert\cos(\theta)

since the length of u is automatically {1}.

The cosine term has a maximum value of {1} when u points in the same direction as \nabla f so that \theta=0. That is, the direction that gives us the largest directional derivative is the direction of \nabla f. And then we can calculate the rate that the function increases in this direction as the length of the gradient \lVert\nabla f\rVert.

So for most purposes we’ll stick to using the differential, but in practice it’s often useful to think of the gradient vector to get some geometric intuition about what the differential means.

October 5, 2009 Posted by | Analysis, Calculus | 4 Comments