The Unapologetic Mathematician

Mathematics for the interested outsider

Extrema with Constraints I

We can consider the problem of maximizing or minimizing a function, as we have been, but insisting that our solution satisfy some constraint.

For instance, we might have a function f:\mathbb{R}^3\rightarrow\mathbb{R} to maximize, but we’re only concerned with unit-length vectors on S^2\subseteq\mathbb{R}^3. More generally, we’ll be concerned with constraints imposed by setting a function g equal to zero. In the example, we might set g(x,y,z)=x^2+y^2+z^2-1. If we want to impose more conditions, we can make g a vector-valued function with as many components as constraint functions we want to set equal to zero.

Now, we might be able to parameterize the collection of points satisfying g(x)=0. In the example, we could use the usual parameterization of the sphere by latitude and longitude, writing

\displaystyle\begin{aligned}x&=\sin(\theta)\cos(\phi)\\y&=\sin(\theta)\sin(\phi)\\z&=\cos(\theta)\end{aligned}

where I’ve used the physicists’ convention on the variables instead of the common one in multivariable calculus classes. Then we could plug these expressions for x, y, and z into our function f, and get a composite function of the variables \phi and \theta, which we can then attack with the tools from the last couple days, being careful about when we can and can’t trust Cauchy’s invariant rule, since the second differential can transform oddly.

Besides even that care that must be taken, it may not even be possible to parameterize the surface, or it may be extremely difficult. At least we do know that such a parameterization will often exist. Indeed, the implicit function theorem tells us that if we have m continuously differentiable constraint functions g^i whose zeroes describe a collection of points in an n-dimensional space \mathbb{R}^n, and the m\times m determinant

\displaystyle\det\left(\frac{\partial g^i}{\partial x^j}\right)

is nonzero at some point a\in\mathbb{R}^n satisfying g(a)=0, then we can “solve” these equations for the first m variables as functions of the last n-m. This gives us exactly such a parameterization, and in principle we could use it. But the calculations get amazingly painful.

Instead, we want to think about this problem another way. We want to consider a point a\in\mathbb{R}^n is a point satisfying g(a)=0 which has a neighborhood N so that for all x\in N satisfying g(x)=0 we have f(a)\geq f(x). This does not say that there are no nearby points to a where f takes on larger values, but it does say that to reach any such point we must leave the region described by g(x)=0.

Now, let’s think about this sort of heuristically. As we look in various directions v from a, some of them are tangent to the region described by g(x)=0. These are the directions satisfying [D_vg](a)=dg(a;v)=0 — where to first order the value of g is not changing in the direction of v. I say that in none of these directions can f change (again, to first order) either. For if it did, either f would increase in that direction or not. If it did, then we could find a path in the region where g(x)=0 along which f was increasing, contradicting our assertion that we’d have to leave the region for this to happen. But if f decreased to first order, then it would increase to first order in the opposite direction, and we’d have the same problem. That is, we must have df(a;v)=0 whenever dg(a;v)=0. And so we find that

\mathrm{Ker}(dg(a))=\bigcap\limits_{i=1}^m\mathrm{Ker}(dg^i(a))\subseteq\mathrm{Ker}(df(a))

The kernel of df(a) consists of all vectors orthogonal to the gradient vector \nabla f(a), and the line it spans is the orthogonal complement to the kernel. Similarly, the kernel of dg(a) consists of all vectors orthogonal to each of the gradient vectors \nabla g^i(a), and is thus the orthogonal complement to the entire subspace they span. The kernel of dg(a) is contained in the kernel of df(a), and orthogonal complements are order-reversing, which means that \nabla f(a) must lie within the span of the \nabla g^i(a). That is, there must be real numbers \lambda_i so that

\displaystyle\nabla f(a)=\sum\limits_{i=1}^m\lambda_i\nabla g^i(a)

or, passing back to differentials

\displaystyle df(a)=\sum\limits_{i=1}^m\lambda_idg^i(a)

So in the presence of constraints we replace the condition df(a)=0 by this one. We call the \lambda_i “Lagrange multipliers”, and for every one of these variables we add to the system of equations, we also add the constraint equation g^i(a)=0, so we should still get an isolated collection of points.

Now, we reached this conclusion by a rather handwavy argument about being able to find increasing directions and so on within the region g(x)=0. This line of reasoning could possibly be firmed up, but we’ll find our proof next time in a slightly different approach.

November 25, 2009 Posted by | Analysis, Calculus | 4 Comments

   

Follow

Get every new post delivered to your Inbox.

Join 366 other followers