# The Unapologetic Mathematician

## Extrema with Constraints II

As we said last time, we have an idea for a necessary condition for finding local extrema subject to constraints of a certain form. To be explicit, we assume that $f:X\rightarrow\mathbb{R}$ is continuously differentiable on an open region $X\subset\mathbb{R}^n$, and we also assume that $g:X\rightarrow\mathbb{R}^m$ is a continuously differentiable vector-valued function on the same region (and that $m). We use $g$ to define the region $A\subseteq X$ consisting of those points $x\in X$ so that $g(x)=0$. Now if $a\in A$ is a point with a neighborhood $N$ so that for all $x\in N\cap A$ we have $f(x)\leq f(a)$ (or $f(x)\geq f(a)$ for all such $x$). And, finally, we assume that the $m\times m$ determinant $\displaystyle\det\left(\frac{\partial g^i}{\partial x^j}\right)$

is nonzero at $a$. Then we have reason to believe that there will exist real numbers $\lambda_1,\dots,\lambda_m$, one for each component of the constraint function, so that the $n$ equations $\displaystyle\frac{\partial f}{\partial x^j}+\sum\limits_{i=1}^m\lambda_i\frac{\partial g^i}{\partial x^j}=0\qquad(j=1,\dots,n)$

are satisfied at $a$.

Well, first off we can solve the first $m$ equations and determine the $\lambda_j$ right off. Rewrite them as $\displaystyle\sum\limits_{i=1}^m\lambda_i\frac{\partial g^i}{\partial x^j}\bigg\vert_a=-\frac{\partial f}{\partial x^j}\bigg\vert_a\qquad(j=1,\dots,m)$

a system of $m$ equations in the $m$ unknowns $\lambda_i$. Since the matrix has a nonzero determinant (by assumption) we can solve this system uniquely to determine the $\lambda_i$. What’s left is to verify that this choice of the $\lambda_i$ also satisfies the remaining $n-m$ equations.

To take care of this, we’ll write $t^k=x^{m+k}$, so we can write the point $x=(x^1,\dots,x^n)$ as $(x';t)=(x^1,\dots,x^m;t^1,\dots,t^{n-m})$ and particularly $a=(a';b)$. Now we can invoke the implicit function theorem! We find an $n-m$-dimensional neighborhood $T$ of $b$ and a unique continuously differentiable function $h:T\rightarrow\mathbb{R}^m$ so that $h(b)=a'$ and $g(h(t);t)=0$ for all $t\in T$. Without loss of generality, we can choose $T$ so that $h(t)\in N\cap A$, where $N$ is the neighborhood from the assumptions above.

This is the parameterization we discussed last time, and we can now substitute these functions into the function $f$. That is, we can define \displaystyle\begin{aligned}F(t^1,\dots,t^{n-m})&=f\left(h^1(t^1,\dots,t^{n-m}),\dots,h^m(t^1,\dots,t^{n-m});t^1,\dots,t^{n-m}\right)\\G^i(t^1,\dots,t^{n-m})&=g^i\left(h^1(t^1,\dots,t^{n-m}),\dots,h^m(t^1,\dots,t^{n-m});t^1,\dots,t^{n-m}\right)\qquad(i=1,\dots,m)\end{aligned}

Or if we define $H(t)=(h(t);t)$ we can say this more succinctly as $F(t)=f(H(t))$ and $G^i(t)=g^i(H(t))$.

Anyhow, now all of these $G^i$ are identically zero on $T$ as a consequence of the implicit function theorem, and so each partial derivative $\frac{\partial G^i}{\partial t^j}$ is identically zero as well. But since the $G^i$ are composite functions we can also use the chain rule to evaluate these partial derivatives. We find \displaystyle\begin{aligned}0&=\frac{\partial G^i}{\partial t^j}\\&=\sum\limits_{k=1}^n\frac{\partial g^i}{\partial x^k}\frac{\partial H^k}{\partial t^j}\\&=\sum\limits_{k=1}^m\frac{\partial g^i}{\partial x^k}\frac{\partial h^k}{\partial t^j}+\sum\limits_{k=m+1}^n\frac{\partial g^i}{\partial x^k}\frac{\partial t^{k-m}}{\partial t^j}\\&=\sum\limits_{k=1}^m\frac{\partial g^i}{\partial x^k}\frac{\partial h^k}{\partial t^j}+\sum\limits_{k=1}^{n-m}\frac{\partial g^i}{\partial x^{k+m}}\delta_j^k\\&=\sum\limits_{k=1}^m\frac{\partial g^i}{\partial x^k}\frac{\partial h^k}{\partial t^j}+\frac{\partial g^i}{\partial x^{j+m}}\end{aligned}

Similarly, since $F$ has a local minimum (as a function of the $t^j$) at $b$ we must find its partial derivatives zero at that point. That is $\displaystyle0=\frac{\partial F}{\partial t^j}\bigg\vert_{t=b}=\sum\limits_{k=1}^m\frac{\partial f}{\partial x^k}\bigg\vert_{x=H(b)}\frac{\partial h^k}{\partial t^j}\bigg\vert_{t=b}+\frac{\partial f}{\partial x^{j+m}}\bigg\vert_{x=H(b)}$

Now let’s take the previous equation involving $g^i$, evaluate it at $t=b$, multiply it by $\lambda_i$, sum over $i$, and add it to this latest equation. We find $\displaystyle0=\sum\limits_{k=1}^m\left[\frac{\partial f}{\partial x^k}\bigg\vert_{x=H(b)}+\sum\limits_{i=1}^m\lambda_i\frac{\partial g^i}{\partial x^k}\bigg\vert_{x=H(b)}\right]\frac{\partial h^k}{\partial j}\bigg\vert_{t=b}+\frac{\partial f}{\partial x^{j+m}}\bigg\vert_{x=H(b)}+\sum\limits_{i=1}^m\lambda_i\frac{\partial g^i}{\partial x^{j+m}}\bigg\vert_{x=H(b)}$

Now, the expression in brackets is zero because that’s actually how we defined the $\lambda_i$ way back at the start of the proof! And then what remains is exactly the equations we need to complete the proof.

November 27, 2009 - Posted by | Analysis, Calculus Comment by Gilbert Bernstein | November 28, 2009 | Reply Comment by Gilbert Bernstein | November 28, 2009 | Reply