The Unapologetic Mathematician

Mathematics for the interested outsider

Extrema with Constraints II

As we said last time, we have an idea for a necessary condition for finding local extrema subject to constraints of a certain form. To be explicit, we assume that f:X\rightarrow\mathbb{R} is continuously differentiable on an open region X\subset\mathbb{R}^n, and we also assume that g:X\rightarrow\mathbb{R}^m is a continuously differentiable vector-valued function on the same region (and that m<n). We use g to define the region A\subseteq X consisting of those points x\in X so that g(x)=0. Now if a\in A is a point with a neighborhood N so that for all x\in N\cap A we have f(x)\leq f(a) (or f(x)\geq f(a) for all such x). And, finally, we assume that the m\times m determinant

\displaystyle\det\left(\frac{\partial g^i}{\partial x^j}\right)

is nonzero at a. Then we have reason to believe that there will exist real numbers \lambda_1,\dots,\lambda_m, one for each component of the constraint function, so that the n equations

\displaystyle\frac{\partial f}{\partial x^j}+\sum\limits_{i=1}^m\lambda_i\frac{\partial g^i}{\partial x^j}=0\qquad(j=1,\dots,n)

are satisfied at a.

Well, first off we can solve the first m equations and determine the \lambda_j right off. Rewrite them as

\displaystyle\sum\limits_{i=1}^m\lambda_i\frac{\partial g^i}{\partial x^j}\bigg\vert_a=-\frac{\partial f}{\partial x^j}\bigg\vert_a\qquad(j=1,\dots,m)

a system of m equations in the m unknowns \lambda_i. Since the matrix has a nonzero determinant (by assumption) we can solve this system uniquely to determine the \lambda_i. What’s left is to verify that this choice of the \lambda_i also satisfies the remaining n-m equations.

To take care of this, we’ll write t^k=x^{m+k}, so we can write the point x=(x^1,\dots,x^n) as (x';t)=(x^1,\dots,x^m;t^1,\dots,t^{n-m}) and particularly a=(a';b). Now we can invoke the implicit function theorem! We find an n-m-dimensional neighborhood T of b and a unique continuously differentiable function h:T\rightarrow\mathbb{R}^m so that h(b)=a' and g(h(t);t)=0 for all t\in T. Without loss of generality, we can choose T so that h(t)\in N\cap A, where N is the neighborhood from the assumptions above.

This is the parameterization we discussed last time, and we can now substitute these functions into the function f. That is, we can define

\displaystyle\begin{aligned}F(t^1,\dots,t^{n-m})&=f\left(h^1(t^1,\dots,t^{n-m}),\dots,h^m(t^1,\dots,t^{n-m});t^1,\dots,t^{n-m}\right)\\G^i(t^1,\dots,t^{n-m})&=g^i\left(h^1(t^1,\dots,t^{n-m}),\dots,h^m(t^1,\dots,t^{n-m});t^1,\dots,t^{n-m}\right)\qquad(i=1,\dots,m)\end{aligned}

Or if we define H(t)=(h(t);t) we can say this more succinctly as F(t)=f(H(t)) and G^i(t)=g^i(H(t)).

Anyhow, now all of these G^i are identically zero on T as a consequence of the implicit function theorem, and so each partial derivative \frac{\partial G^i}{\partial t^j} is identically zero as well. But since the G^i are composite functions we can also use the chain rule to evaluate these partial derivatives. We find

\displaystyle\begin{aligned}0&=\frac{\partial G^i}{\partial t^j}\\&=\sum\limits_{k=1}^n\frac{\partial g^i}{\partial x^k}\frac{\partial H^k}{\partial t^j}\\&=\sum\limits_{k=1}^m\frac{\partial g^i}{\partial x^k}\frac{\partial h^k}{\partial t^j}+\sum\limits_{k=m+1}^n\frac{\partial g^i}{\partial x^k}\frac{\partial t^{k-m}}{\partial t^j}\\&=\sum\limits_{k=1}^m\frac{\partial g^i}{\partial x^k}\frac{\partial h^k}{\partial t^j}+\sum\limits_{k=1}^{n-m}\frac{\partial g^i}{\partial x^{k+m}}\delta_j^k\\&=\sum\limits_{k=1}^m\frac{\partial g^i}{\partial x^k}\frac{\partial h^k}{\partial t^j}+\frac{\partial g^i}{\partial x^{j+m}}\end{aligned}

Similarly, since F has a local minimum (as a function of the t^j) at b we must find its partial derivatives zero at that point. That is

\displaystyle0=\frac{\partial F}{\partial t^j}\bigg\vert_{t=b}=\sum\limits_{k=1}^m\frac{\partial f}{\partial x^k}\bigg\vert_{x=H(b)}\frac{\partial h^k}{\partial t^j}\bigg\vert_{t=b}+\frac{\partial f}{\partial x^{j+m}}\bigg\vert_{x=H(b)}

Now let’s take the previous equation involving g^i, evaluate it at t=b, multiply it by \lambda_i, sum over i, and add it to this latest equation. We find

\displaystyle0=\sum\limits_{k=1}^m\left[\frac{\partial f}{\partial x^k}\bigg\vert_{x=H(b)}+\sum\limits_{i=1}^m\lambda_i\frac{\partial g^i}{\partial x^k}\bigg\vert_{x=H(b)}\right]\frac{\partial h^k}{\partial j}\bigg\vert_{t=b}+\frac{\partial f}{\partial x^{j+m}}\bigg\vert_{x=H(b)}+\sum\limits_{i=1}^m\lambda_i\frac{\partial g^i}{\partial x^{j+m}}\bigg\vert_{x=H(b)}

Now, the expression in brackets is zero because that’s actually how we defined the \lambda_i way back at the start of the proof! And then what remains is exactly the equations we need to complete the proof.

November 27, 2009 Posted by | Analysis, Calculus | 2 Comments