The Unapologetic Mathematician

Mathematics for the interested outsider

The Gradient Vector

The most common first approach to differential calculus in more than one variable starts by defining partial derivatives and directional derivatives, as we did. But instead of defining the differential, it simply collects the partial derivatives together as the components of a vector called the gradient of f, and written \nabla f.

We showed that these partial derivatives are the components of the differential (when it exists), and so there should be some connection between the two concepts. And indeed there is.

As a bilinear form, our inner product defines an isomorphism from the space of displacements to its dual space. This isomorphism sends the basis vector e_i to the dual basis vector dx^i, since we can check both

\displaystyle dx^i(v)=dx^i(v^je_j)=v^j\delta_j^i=v^i

and

\displaystyle\langle e_i,v\rangle=\langle e_i,v^je_j\rangle=v^j\delta_j^i=v^i

That is, the linear functional dx^i is the same as the linear functional \langle e_i,\underline{\hphantom{x}}\rangle.

So under this isomorphism the differential df=\frac{\partial f}{\partial x^i}dx^i corresponds to the vector

\displaystyle\nabla f=\begin{pmatrix}\frac{\partial f}{\partial x^1}\\\vdots\\\frac{\partial f}{\partial x^n}\end{pmatrix}

We can remove the function from this notation to write the operator \nabla on its own as

\displaystyle\nabla=\begin{pmatrix}\frac{\partial}{\partial x^1}\\\vdots\\\frac{\partial}{\partial x^n}\end{pmatrix}

We also write the gradient vector at a given point as \nabla f(x), where we have to remember to parse this as evaluating a function \nabla f at the point x rather than as applying the operator \nabla to the value f(x).

Now, under our approach the differential is more fundamental and more useful than the gradient vector. However, there is some meaning to the geometric interpretation of the gradient as a displacement vector.

First of all, let’s ask that u be a unit vector. Then we can calculate the directional derivative \left[D_uf\right](x)=df(x;u). But the linear functional given by the differential df(x) is the same as the linear functional \langle\nabla f(x),\underline{\hphantom{x}}\rangle. Thus we also find that \left[D_uf\right](x)=\langle\nabla f(x),u\rangle. And we can interpret this inner product in terms of the length of \nabla f and the angle \theta between \nabla f and u:

\left[D_uf\right](x)=\lVert\nabla f(x)\rVert\cos(\theta)

since the length of u is automatically {1}.

The cosine term has a maximum value of {1} when u points in the same direction as \nabla f so that \theta=0. That is, the direction that gives us the largest directional derivative is the direction of \nabla f. And then we can calculate the rate that the function increases in this direction as the length of the gradient \lVert\nabla f\rVert.

So for most purposes we’ll stick to using the differential, but in practice it’s often useful to think of the gradient vector to get some geometric intuition about what the differential means.

October 5, 2009 Posted by | Analysis, Calculus | 4 Comments

Examples and Notation

Okay, let’s do some simple examples of differentials, which will lead to some notational “syntactic sugar”.

First of all, if we pick an orthonormal basis \left\{e_i\right\}_{i=1}^n we can write any point as x=x^ie_i. This gives us n nice functions to consider: x^i:\mathbb{R}^n\rightarrow\mathbb{R} is the function that takes a point and returns its ith coordinate. This is actually a sort of subtle point that’s important to consider deeply. We’re used to thinking of x^i as a variable, which stands in for some real number. I’m saying that we want to consider it as a function in its own right. In a way, this is just extending what we did when we considered polynomials as functions and we can do everything algebraically with abstract “variables” as we can with specific “functions” as our x^i.

Analytically, though, we can ask how the function x^i behaves as we move our input point around. It’s easy to find the partial derivatives. If k\neq i then

\displaystyle\left[D_kx^i\right](x)=\lim\limits_{t\to0}\frac{x^i(x+te_k)-x^i(x)}{t}=\lim\limits_{t\to0}\frac{0}{t}=0

since moving in the e_k direction doesn’t change the ith component. On the other hand, if k=i then

\displaystyle\left[D_kx^i\right](x)=\lim\limits_{t\to0}\frac{x^i(x+te_k)-x^i(x)}{t}=\lim\limits_{t\to0}\frac{t}{t}=1

since moving a distance t in the e_k direction adds exactly t to the ith component. That is, we can write D_kx^i=\delta_k^i — the Kronecker delta.

Of course, since {0} and {1} are both constant, they’re clearly continuous everywhere. Thus by the condition we worked out yesterday the differential of x^i exists, and we find

\displaystyle dx^i(x;t)=\delta_k^it^k=t^i

We can also write the differential as a linear functional dx^i(x). Since this takes a vector t and returns its ith component, it is exactly the dual basis element \eta^i. That is, once we pick an orthonormal basis for our vector space of displacements, we can actually write the dual basis of linear functionals as the differentials dx^i. And from now on that’s exactly what we’ll do.

So, for example, let’s say we’ve got a differentiable function f:\mathbb{R}^n\rightarrow\mathbb{R}. Then we can write its differential as a linear functional

df(x)=\left[D_1f\right](x)dx^1+\dots+\left[D_nf\right](x)dx^n=\left[D_if\right](x)dx^i

In the one-dimensional case, we write df(x)=f'(x)dx, leading us to the standard Leibniz notation

\displaystyle\frac{df}{dx}=f'

If we have to evaluate this function, we use an “evaluation bar” \frac{df}{dx}\bigr\vert_{x}=f'(x), or \frac{df}{dx}\bigr\vert_{x=a}=f'(a) telling us to substitute a for x in the formula for \frac{df}{dx}. We also can write the operator that takes in a function and returns its derivative by simply removing the function from this Leibniz notation: \frac{d}{dx}.

Now when it comes to more than one variable, we can’t just “divide” by one of the differentials dx^i, but we’re going to use something like this notation to read off the coefficient anyway. In order to remind us that we’re not really dividing and that there are other variables floating around, we replace the d with a curly version: \partial. Then we can write the partial derivative

\displaystyle\frac{\partial f}{\partial x^i}=D_if

and the whole differential as

\displaystyle df=\frac{\partial f}{\partial x^1}dx^1+\dots+\frac{\partial f}{\partial x^n}dx^n=\frac{\partial f}{\partial x^i}dx^i

Notice here that when we see an upper index in the denominator of this notation, we consider it to be a lower index. Similarly, if we find a lower index in the denominator, we’ll consider it to be like an upper index for the purposes of the summation convention. We can even incorporate evaluation bars

\displaystyle df(a)=\frac{\partial f}{\partial x^1}\biggr\vert_{x=a}dx^1+\dots+\frac{\partial f}{\partial x^n}\biggr\vert_{x=a}dx^n=\frac{\partial f}{\partial x^i}\biggr\vert_{x=a}dx^i

or strip out the function altogether to write the “differential operator”

\displaystyle d=\frac{\partial}{\partial x^1}dx^1+\dots+\frac{\partial}{\partial x^n}dx^n=\frac{\partial}{\partial x^i}dx^i

October 2, 2009 Posted by | Analysis, Calculus | Leave a comment

An Existence Condition for the Differential

To this point we’ve seen what happens when a function f does have a differential at a given point x, but we haven’t yet seen any conditions that tell us that any such function df(x;t) exists. We know from the uniqueness proof that if it does exist, then given an orthonormal basis we have all partial derivatives, and the differential must be given by the formula

\displaystyle df(x;t)=\left[D_if\right](x)t^i

where D_if is the partial derivative of f in the ith coordinate direction. This is clearly linear in the displacement t, so all that remains is to see whether the inequality in the definition of the differential can be satisfied.

We must have all partial derivatives to write down this formula, but that can’t be sufficient for differentiability, because if it were then having all partial derivatives would imply continuity, and we know that it doesn’t. What will be sufficient is to ask that not only do all partial derivatives exist at x, but that they themselves are continuous there. Note, though, that I’m not asserting that this condition is not necessary for a function to be differentiable. Indeed, it’s possible to construct differentiable functions whose partial derivatives all exist, but are not continuous at x. This is an example of the way that analysis tends to be shot through with “counterexamples”, as Michael was talking about recently.

Okay, so let’s assume that all these partial derivatives D_if exist and are continuous at x. We have to show that for any \epsilon>0 there is some \delta>0 so that if \delta>\lVert t\rVert>0 we have the inequality

\displaystyle\left\lvert\left[f(x+t)-f(x)\right]-\left[D_if\right](x)t^i\right\rvert<\epsilon\lVert t\rVert

We’re going to take the difference f(x+t)-f(x) and break it into n terms, each of which will approximate one of the partial derivative terms.

First off, since each D_if is continuous at x, there is some \delta so that if \lVert t\rVert<\delta then \lvert\left[D_if\right](x+t)-\left[D_if\right](x)\rvert<\frac{\epsilon}{n}. In fact, there’s a \delta for each index i, but we can just take the smallest of all these, and that one will work for each index. From this point on, we’ll assume that \lVert t\rVert is actually less than \frac{\delta}{2}. We’ll write t=\lambda u, where u is a unit vector and \lambda is a scalar so that \lvert\lambda\rvert=\lVert t\rVert<\frac{\delta}{2}. We’ll also write u in terms of our orthonormal basis u=u^ie_i.

Now we can build up our displacement direction u step-by-step as a sequence of vectors v_0=0, v_1=u^1e_1, and so on, stepping in the ith direction on the ith step: v_k=v_{k-1}+u^ke_k (not summing on k here). So we can break up the difference of function values as

\displaystyle f(x+\lambda u)-f(x)=\sum\limits_{k=1}^n\left[f(x+\lambda v_{k-1}+\lambda u^ke_k)-f(x+\lambda v_{k-1})\right]

So now each step only changes the kth coordinate, and the points at each end both lie within the ball of radius \frac{\delta}{2} around x, since each v_k is shorter than u, which has unit length. To look closer at the step from f(x+\lambda v_{k-1}) to f(x+\lambda v_{k-1}+\lambda u^ke_k), we introduce a new function of one real variable:

\displaystyle g(\alpha)=f(x+\lambda v_{k-1}+\alpha e_k)

for -\lvert\lambda u^k\rvert\leq\alpha\leq\lvert\lambda u^k\rvert. This lets us write our step as g(\lambda u^k)-g(0). It turns out that everywhere in this closed interval, the function g is differentiable! Indeed, we have

\displaystyle\frac{g(\alpha+h)-g(\alpha)}{h}=\frac{f(x+\lambda v_{k-1}+\alpha e_k+he_k)-f(x+\lambda v_{k-1}+\alpha e_k)}{h}

So as h goes to zero, we find g'(\alpha)=\left[D_kf\right](x+\lambda v_{k-1}+\alpha e_k), which exists because we’re in a small enough ball around x. Now the mean value theorem can be brought to bear, which says

\displaystyle g(\lambda u^k)-g(0)=\lambda u^kg'(\alpha_k)

for some -\lvert\lambda u^k\rvert\leq\alpha_k\leq\lvert\lambda u^k\rvert. And now the difference of function values can be written

\displaystyle\begin{aligned}f(x+t)-f(x)&=\lambda\sum\limits_{k=1}^nu^k\left[D_kf\right](x+\lambda v_{k-1}+\alpha_ke_k)\\&=\sum\limits_{k=1}^n\left[D_kf\right](x)t^k+\lambda\sum\limits_{k=1}^nu^k\left[\left[D_kf\right](x+\lambda v_{k-1}+\alpha_ke_k)-\left[D_kf\right](x)\right]\end{aligned}

since t^k=\lambda u^k.

Now \lvert\lambda v_{k-1}+\alpha_ke_k\rvert\leq\lvert\lambda\rvert+\lvert\lambda u^k\rvert<2\lvert\lambda\rvert<\delta, and so we find that the each of these differences of partial derivative evaluations is less than \frac{\epsilon}{n}. And thus

\displaystyle\left\lvert\left[f(x+t)-f(x)\right]-\sum\limits_{k=1}^n\left[D_kf\right](x)t^k\right\rvert<\lvert\lambda\rvert\epsilon=\epsilon\lVert t\rVert

which establishes the inequality we need.

October 1, 2009 Posted by | Analysis, Calculus | 10 Comments

Follow

Get every new post delivered to your Inbox.

Join 366 other followers