The Unapologetic Mathematician

Mathematics for the interested outsider

Classifying Critical Points

So let’s say we’ve got a critical point of a multivariable function f:X\rightarrow\mathbb{R}. That is, a point a\in X where the differential df(x) vanishes. We want something like the second derivative test that might tell us more about the behavior of the function near that point, and to identify (some) local maxima and minima. We’ll assume here that f is twice continuously differentiable in some region S around a.

The analogue of the second derivative for multivariable functions is the second differential d^2f(x). This function assigns to every point a bilinear function of two displacement vectors u and v, and it measures the rate at which the directional derivative in the direction of v is changing as we move in the direction of u. That is,

\displaystyle d^2f(x;u,v)=\left[D_u\left(D_vf\right)\right](x)

If we choose coordinates on X given by an orthonormal basis \{e_i\}_{i=1}^n, we can write the second differential in terms of coordinates

\displaystyle d^2f(x)=\frac{\partial^2f}{\partial x^i\partial x^j}dx^idx^j

This matrix is often called the “Hessian” of f at the point x.

As I said above, this is a bilinear form. Further, Clairaut’s theorem tells us that it’s a symmetric form. Then the spectral theorem tells us that we can find an orthonormal basis with respect to which the Hessian is actually diagonal, and the diagonal entries are the eigenvalues of the matrix.

So let’s go back and assume we’re working with such a basis. This means that our second partial derivatives are particularly simple. We find that for i\neq j we have

\displaystyle\frac{\partial^2f}{\partial x^i\partial x^j}=0

and for i=j, the second partial derivative is an eigenvalue

\displaystyle\frac{\partial^2f}{{\partial x^i}^2}=\lambda_i

which we can assume (without loss of generality) are nondecreasing. That is, \lambda_1\leq\lambda_2\leq\dots\leq\lambda_n.

Now, if all of these eigenvalues are positive at a critical point a, then the Hessian is positive-definite. That is, given any direction v we have d^2f(a;v,v)>0. On the other hand, if all of the eigenvalues are negative, the Hessian is negative definite; given any direction v we have d^2f(a;v,v)<0. In the former case, we’ll find that f has a local minimum in a neighborhood of a, and in the latter case we’ll find that f has a local maximum there. If some eigenvalues are negative and others are positive, then the function has a mixed behavior at a we’ll call a “saddle” (sketch the graph of f(x,y)=xy near (0,0) to see why). And if any eigenvalues are zero, all sorts of weird things can happen, though at least if we can find one positive and one negative eigenvalue we know that the critical point can’t be a local extremum.

We remember that the determinant of a diagonal matrix is the product of its eigenvalues, so if the determinant of the Hessian is nonzero then either we have a local maximum, we have a local minimum, or we have some form of well-behaved saddle. These behaviors we call “generic” critical points, since if we “wiggle” the function a bit (while maintaining a critical point at a) the Hessian determinant will stay nonzero. If the Hessian determinant is zero, wiggling the function a little will make it nonzero, and so this sort of critical point is not generic. This is the sort of unstable situation analogous to a failure of the second derivative test. Unfortunately, the analogy doesn’t extent, in that the sign of the Hessian determinant isn’t instantly meaningful. In two dimensions a positive determinant means both eigenvalues have the same sign — denoting a local maximum or a local minimum — while a negative determinant denotes eigenvalues of different signs — denoting a saddle. This much is included in multivariable calculus courses, although usually without a clear explanation why it works.

So, given a direction vector v so that d^2f(a;v,v)>0, then since f is in C^2(S), there will be some neighborhood N of a so that d^2f(x;v,v)>0 for all x\in N. In particular, there will be some range of t so that b=a+tv\in N. For any such point we can use Taylor’s theorem with m=2 to tell us that

\displaystyle f(b)-f(a)=\frac{1}{2}d^2f(\xi;tv,tv)=\frac{t^2}{2}d^2f(\xi;v,v)

for some \xi\in[a,b]\subseteq N. And from this we see that f(b)>f(a) for every b\in N so that b-a=tv. A similar argument shows that if d^2f(a;v,v)<0 then f(b)<f(a) for any b near a in the direction of v.

Now if the Hessian is positive-definite then every direction v from a gives us d^2f(a;v,v)>0, and so every point b near a satisfies f(b)>f(a). If the Hessian is negative-definite, then every point b near a satisfies f(b)<f(a). And if the Hessian has both positive and negative eigenvalues then within any neighborhood we can find some directions in which f(b)>f(a) and some in which f(b)<f(a).

November 24, 2009 Posted by | Analysis, Calculus | 4 Comments

   

Follow

Get every new post delivered to your Inbox.

Join 391 other followers