The Unapologetic Mathematician

Mathematics for the interested outsider

Local Extrema in Multiple Variables

Just like in one variable, we’re interested in local maxima and minima of a function f:X\rightarrow\mathbb{R}, where X is an open region in \mathbb{R}^n. Again, we say that f has a local minimum at a point a\in X if there is some neighborhood N of a so that f(a)\leq f(x) for all x\in N. A maximum is similarly defined, except that we require f(a)\geq f(x) in the neighborhood. As I alluded to recently, we can bring Fermat’s theorem to bear to determine a necessary condition.

Specifically, if we have coordinates on \mathbb{R}^n given by a basis \{e_i\}_{i=1}^n, we can regard f as a function of the n variables x^i. We can fix n-1 of these variables x^i=a^i for i\neq k and let x^k vary in a neighborhood of a^k. If f has a local extremum at x=a, then in particular it has a local extremum along this coordinate line at x^k=a^k. And so we can use Fermat’s theorem to draw conclusions about the derivative of this restricted function at x^k=a^k, which of course is the partial derivative \frac{\partial f}{\partial x^k}\big\vert_{x=a}.

So what can we say? For each variable x^k, the partial derivative \frac{\partial f}{\partial x^k} either does not exist or is equal to zero at x=a. And because the differential subsumes the partial derivatives, if any of them fail to exist the differential must fail to exist as well. On the other hand, if they all exist they’re all zero, and so df(a)=0 as well. Incidentally, we can again make the connection to the usual coverage in a multivariable calculus course by remembering that the gradient \nabla f(a) is the vector that corresponds to the linear functional of the differential df(a). So at a local extremum we must have \nabla f(a)=0.

As was the case with Fermat’s theorem, this provides a necessary, but not a sufficient condition to have a local extremum. Anything that can go wrong in one dimension can be copied here. For instance, we could define f(x,y)=x^2+y^3. Then we find df=2x\,dx+3y^2\,dy, which is zero at (0,0). But any neighborhood of this point will contain points (0,t) and (0,-t) for small enough t>0, and we see that f(0,t)>f(0,0)>f(0,-t), so the origin cannot be a local extremum.

But weirder things can happen. We might ask that f have a local minimum at a along any line, like we tried with directional derivatives. But even this can go wrong. If we define

\displaystyle f(x,y)=(y-x^2)(y-3x^2)=y^2-4x^2y+3x^4

we can calculate

\displaystyle df=\left(-8xy+12x^3\right)dx+\left(2y-4x^2\right)dy

which again is zero at (0,0). Along any slanted line through the origin y=kx we find

\displaystyle\begin{aligned}f(t,kt)&=3t^4-4kt^3+k^2t^2\\\frac{d}{dt}f(t,kt)&=12t^3-12kt^2+2k^2t\\\frac{d^2}{dt^2}f(t,kt)&=36t^2-24kt+2k^2\end{aligned}

and so the second derivative is always positive at the origin, except along the x-axis. For the vertical line, we find

\displaystyle\begin{aligned}f(0,t)&=t^2\\\frac{d}{dt}f(t,kt)&=2t\\\frac{d^2}{dt^2}f(t,kt)&=2\end{aligned}

so along all of these lines we have a local minimum at the origin by the second derivative test. And along the x-axis, we have f(x,0)=3x^4, which has the origin as a local minimum.

Unfortunately, it’s still not a local minimum in the plane, since any neighborhood of the origin must contain points of the form (t,2t^2) for small enough t. For these points we find

\displaystyle f(t,2t^2)=-t^4<0=f(0,0)

and so f cannot have a local minimum at the origin.

What we’ll do is content ourselves with this analogue and extension of Fermat’s theorem as a necessary condition, and then develop tools that can distinguish the common behaviors near such critical points, analogous to the second derivative test.

November 23, 2009 Posted by John Armstrong | Analysis, Calculus | | 1 Comment

Sunday Samples 148

When I talked about David Bowie last week, I of course had to mention one of his recurring characters, Major Tom. Interestingly enough, Bowie was not the only one to use this character. In 1983, German singer Peter Schilling wrote “Major Tom (Coming Home)” for his first English-language album, and it shot to the top of the charts in many places around the world (though it only peaked at 14 in the United States).

Some months back, a band called Shiny Toy Guns made a cover of “Major Tom (Coming Home)”, very much in keeping with the spirit of the original, but swapping out all the guitar, bass, and drum parts for synthesized lines, and replacing Schiller’s vocals by those of a female lead with a much cleaner, more antiseptic tone to match the theme. Unfortunately, nothing much seems to have been done with it beyond using it in a car commercial.
Read more »

November 22, 2009 Posted by John Armstrong | Sunday Samples | | No Comments Yet

The Implicit Function Theorem II

Okay, today we’re going to prove the implicit function theorem. We’re going to think of our function f as taking an n-dimensional vector x and a m-dimensional vector t and giving back an n-dimensional vector f(x;t). In essence, what we want to do is see how this output vector must change as we change t, and then undo that by making a corresponding change in x. And to do that, we need to know how changing the output changes x, at least in a neighborhood of f(x;t)=0. That is, we’ve got to invert a function, and we’ll need to use the inverse function theorem.

But we’re not going to apply it directly as the above heuristic suggests. Instead, we’re going to “puff up” the function f:S\rightarrow\mathbb{R}^n into a bigger function F:S\rightarrow\mathbb{R}^{n+m} that will give us some room to maneuver. For 1\leq i\leq n we define

\displaystyle F^i(x;t)=f^i(x;t)

just copying over our original function. Then we continue by defining for 1\leq j\leq m

\displaystyle F^{n+j}(x;t)=t^j

That is, the new m component functions are just the coordinate functions t^j. We can easily calculate the Jacobian matrix

\displaystyle dF=\begin{pmatrix}\frac{\partial f^i}{\partial x^j}&\frac{\partial f^i}{\partial t^j}\\{0}&I_m\end{pmatrix}

where {0} is the m\times n zero matrix and I_m is the m\times m identity matrix. From here it’s straightforward to find the Jacobian determinant

\displaystyle J_F(x;t)=\det\left(dF\right)=\det\left(\frac{\partial f^i}{\partial x^j}\right)

which is exactly the determinant we assert to be nonzero at (a;b). We also easily see that F(a;b)=(0;b).

And so the inverse function theorem tells us that there are neighborhoods X of (a;b) and Y of (0;b) so that F is injective on X and Y=F(X), and that there is a continuously differentiable inverse function G:Y\rightarrow X so that G(F(x;t))=(x;t) for all (x;t)\in X. We want to study this inverse function to recover our implicit function from it.

First off, we can write G(y;s)=(v(y;s);w(y;s)) for two functions: v which takes n-dimensional vector values, and w which takes m-dimensional vector values. Our inverse relation tells us that

\displaystyle\begin{aligned}v(F(x;t))&=x\\w(F(x;t))&=t\end{aligned}

But since F is injective from X onto Y, we can write any point (y;s)\in Y as (y;s)=F(x;t), and in this case we must have s=t by the definition of s. That is, we have

\displaystyle\begin{aligned}v(y;t)&=v(F(x;t))=x\\w(y;t)&=w(F(x;t))=t\end{aligned}

And so we see that G(y;t)=(x;t), where x is the n-dimensional vector so that y=f(x;t). We thus have f(v(y;t);t)=y for every (y;t)\in Y.

Now define T\subseteq\mathbb{R}^m be the collection of vectors t so that (0;t)\in Y, and for each such t\in T define g(t)=v(0;t), so F(g(t);t)=0. As a slice of the open set Y in the product topology on \mathbb{R}^n\times\mathbb{R}^m, the set T is open in \mathbb{R}^m. Further, g is continuously differentiable on T since G is continuously differentiable on Y, and the components of g are taken directly from those of G. Finally, b is in T since (a;b)\in X, and F(a;b)=(0;b)\in Y by assumption. This also shows that g(b)=a.

The only thing left is to show that g is uniquely defined. But there can only be one such function, by the injectivity of f. If there were another such function h then we’d have f(g(t);t)=f(h(t);t), and thus (g(t);t)=(h(t);t), or g(t)=h(t) for every t\in T.

November 20, 2009 Posted by John Armstrong | Analysis, Calculus | | No Comments Yet

The Implicit Function Theorem I

Let’s consider the function F(x,y)=x^2+y^2-1. The collection of points (x,y) so that F(x,y)=0 defines a curve in the plane: the unit circle. Unfortunately, this relation is not a function. Neither is y defined as a function of x, nor is x defined as a function of y by this curve. However, if we consider a point (a,b) on the curve (that is, with F(a,b)=0), then near this point we usually do have a graph of x as a function of y (except for a few isolated points). That is, as we move y near the value b then we have to adjust x to maintain the relation F(x,y)=0. There is some function f(y) defined “implicitly” in a neighborhood of b satisfying the relation F(f(y),y)=0.

We want to generalize this situation. Given a system of n functions of n+m variables

\displaystyle f^i(x;t)=f^i(x^1,\dots,x^n;t^1,\dots,t^m)

we consider the collection of points (x;t) in n+m-dimensional space satisfying f(x;t)=0.

If this were a linear system, the rank-nullity theorem would tell us that our solution space is (generically) m dimensional. Indeed, we could use Gauss-Jordan elimination to put the system into reduced row echelon form, and (usually) find the resulting matrix starting with an n\times n identity matrix, like

\displaystyle\begin{pmatrix}1&0&0&2&1\\{0}&1&0&3&0\\{0}&0&1&-1&1\end{pmatrix}

This makes finding solutions to the system easy. We put our n+m variables into a column vector and write

\displaystyle\begin{pmatrix}1&0&0&2&1\\{0}&1&0&3&0\\{0}&0&1&-1&1\end{pmatrix}\begin{pmatrix}x^1\\x^2\\x^3\\t^1\\t^2\end{pmatrix}=\begin{pmatrix}x^1+2t^1+t^2\\x^2+3t^1\\x^3-t^1+t^2\end{pmatrix}=\begin{pmatrix}0\\{0}\\{0}\end{pmatrix}

and from this we find

\displaystyle\begin{aligned}x^1&=-2t^1-t^2\\x^2&=-3t^1\\x^3&=t^1-t^2\end{aligned}

Thus we can use the m variables t^j as parameters on the space of solutions, and define each of the x^i as a function of the t^j.

But in general we don’t have a linear system. Still, we want to know some circumstances under which we can do something similar and write each of the x^i as a function of the other variables t^j, at least near some known point (a;b).

The key observation is that we can perform the Gauss-Jordan elimination above and get a matrix with rank n if and only if the leading n\times n matrix is invertible. And this is generalized to asking that some Jacobian determinant of our system of functions is nonzero.

Specifically, let’s assume that all of the f^i are continuously differentiable on some region S in n+m-dimensional space, and that (a;b) is some point in S where f(a;b)=0, and at which the determinant

\displaystyle\det\left(\frac{\partial f^i}{\partial x^j}\bigg\vert_{(a;t)}\right)\neq0

where both indices i and j run from 1 to n to make a square matrix. Then I assert that there is some k-dimensional neighborhood T of b and a uniquely defined, continuously differentiable, vector-valued function g:T\rightarrow\mathbb{R}^n so that g(b)=a and f(g(t);t)=0.

That is, near (a;b) we can use the variables t^j as parameters on the space of solutions to our system of equations. Near this point, the solution set looks like the graph of the function x=g(t), which is implicitly defined by the need to stay on the solution set as we vary t. This is the implicit function theorem, and we will prove it next time.

November 19, 2009 Posted by John Armstrong | Analysis, Calculus | | 1 Comment

The Inverse Function Theorem

At last we come to the theorem that I promised. Let f:S\rightarrow\mathbb{R}^n be continuously differentiable on an open region S\subseteq\mathbb{R}^n, and T=f(S). If the Jacobian determinant J_f(a)\neq0 at some point a\in S, then there is a uniquely determined function g and two open sets X\subseteq S and Y\subseteq T so that

  • a\in X, and f(a)\in Y
  • Y=f(X)
  • f is injective on X
  • g is defined on Y, g(Y)=X, and g(f(x))=x for all x\in X
  • g is continuously differentiable on Y

The Jacobian determinant J_f(x) is continuous as a function of x, so there is some neighborhood N_1 of a so that the Jacobian is nonzero within N_1. Our second lemma tells us that there is a smaller neighborhood N\subseteq N_1 on which f is injective. We pick some closed ball \overline{K}\subseteq N centered at a, and use our first lemma to find that f(K) must contain an open neighborhood Y of f(a). Then we define X=f^{-1}(Y)\cap K, which is open since both K and f^{-1}(Y) are (the latter by the continuity of f). Since f is injective on the compact set \overline{K}\subseteq N, it has a uniquely-defined continuous inverse g on Y\subseteq f(\overline{K}). This establishes the first four of the conditions of the theorem.

Now the hard part is showing that g is continuously differentiable on Y. To this end, like we did in our second lemma, we define the function

\displaystyle h(z_1,\dots,z_n)=\det\left(\frac{\partial f^i}{\partial x^j}\bigg\vert_{x=z_i}\right)

along with a neighborhood N_2 of a so that as long as all the z_i are within N_2 this function is nonzero. Without loss of generality we can go back and choose our earlier neighborhood N so that N\subseteq N_2, and thus that \overline{K}\subseteq N_2.

To show that the partial derivative \frac{\partial g^i}{\partial y^j} exists at a point y\in Y, we consider the difference quotient

\displaystyle\frac{g^i(y+\lambda e_j)-g^i(y)}{\lambda}

with y+\lambda e_j also in Y for sufficiently small \lvert\lambda\rvert. Then writing x_1=g(y) and x_2=g(y+\lambda e_j) we find f(x_2)-f(x_1)=\lambda e_j. The mean value theorem then tells us that

\displaystyle\begin{aligned}\delta_j^k&=\frac{f^k(x_2)-f^k(x_1)}{\lambda}\\&=df^k(\xi_k)\left(\frac{1}{\lambda}(x_2-x_1)\right)\\&=\frac{\partial f^k}{\partial x^i}\bigg\vert_{x=\xi_k}\frac{x_2^i-x_1^i}{\lambda}\\&=\frac{\partial f^k}{\partial x^i}\bigg\vert_{x=\xi_k}\frac{g^i(y+\lambda e_j)-g^i(y)}{\lambda}\end{aligned}

for some \xi_k\in[x_1,x_2]\subseteq K (no summation on k). As usual, \delta_j^k is the Kronecker delta.

This is a linear system of equations, which has a unique solution since the determinant of its matrix is h(\xi_1,\dots,\xi_n)\neq0. We use Cramer’s rule to solve it, and get an expression for our difference quotient as a quotient of two determinants. This is why we want the form of the solution given by Cramer’s rule, and not by a more computationally-efficient method like Gaussian elimination.

As \lambda approaches zero, continuity of g tells us that x_2 approaches x_1, and thus so do all of the \xi_k. Therefore the determinant in the denominator of Cramer’s rule is in the limit h(x,\dots,x)=J_f(x)\neq0, and thus limits of the solutions given by Cramer’s rule actually do exist.

This establishes that the partial derivative \frac{\partial g^i}{\partial y^j} exists at each y\in Y. Further, since we found the limit of the difference quotient by Cramer’s rule, we have an expression given by the quotient of two determinants, each of which only involves the partial derivatives of f, which are themselves all continuous. Therefore the partial derivatives of g not only exist but are in fact continuous.

November 18, 2009 Posted by John Armstrong | Analysis, Calculus | | 1 Comment

Cramer’s Rule

We’re trying to invert a function f:X\rightarrow\mathbb{R}^n which is continuously differentiable on some region X\subseteq\mathbb{R}^n. That is we know that if a is a point where J_f(a)\neq0, then there is a ball N around a where f is one-to-one onto some neighborhood f(N) around f(a). Then if y is a point in f(N), we’ve got a system of equations

\displaystyle f^j(x^1,\dots,x^n)=y^j

that we want to solve for all the x^i.

We know how to handle this if f is defined by a linear transformation, represented by a matrix A=\left(a_i^j\right):

\displaystyle\begin{aligned}f^j(x^1,\dots,x^n)=a_i^jx^i&=y^j\\Ax&=y\end{aligned}

In this case, the Jacobian transformation is just the function f itself, and so the Jacobian determinant \det\left(a_i^j\right) is nonzero if and only if the matrix A is invertible. And so our solution depends on finding the inverse A^{-1} and solving

\displaystyle\begin{aligned}Ax&=y\\A^{-1}Ax&=A^{-1}y\\x&=A^{-1}y\end{aligned}

This is the approach we’d like to generalize. But to do so, we need a more specific method of finding the inverse.

This is where Cramer’s rule comes in, and it starts by analyzing the way we calculate the determinant of a matrix A. This formula

\displaystyle\sum\limits_{\pi\in S_n}\mathrm{sgn}(\pi)a_1^{\pi(1)}\dots a_n^{\pi(n)}

involves a sum over all the permutations \pi\in S_n, and we want to consider the order in which we add up these terms. If we fix an index i, we can factor out each matrix entry in the ith column:

\displaystyle\sum\limits_{j=1}^na_i^j\sum\limits_{\substack{\pi\in S_n\\\pi(i)=j}}\mathrm{sgn}(\pi)a_1^{\pi(1)}\dots\widehat{a_i^j}\dots a_n^{\pi(n)}

where the hat indicates that we omit the ith term in the product. For a given value of j, we can consider the restricted sum

\displaystyle A_j^i=\sum\limits_{\substack{\pi\in S_n\\\pi(i)=j}}\mathrm{sgn}(\pi)a_1^{\pi(1)}\dots\widehat{a_i^j}\dots a_n^{\pi(n)}

which is (-1)^{i+j} times the determinant of the i-j “minor” of the matrix A. That is, if we strike out the row and column of A which contain a_i^j and take the determinant of the remaining (n-1)\times(n-1) matrix, we multiply this by (-1)^{i+j} to get A_j^i. These are the entries in the “adjugate” matrix \mathrm{adj}(A).

What we’ve shown is that

\displaystyle A_j^ia_i^j=\det(A)

(no summation on i). It’s not hard to show, however, that if we use a different row from the adjugate matrix we find

\displaystyle\sum\limits_{j=1}^nA_j^ka_i^j=\det(A)\delta_i^k

That is, the adjugate times the original matrix is the determinant of A times the identity matrix. And so if \det(A)\neq0 we find

\displaystyle A^{-1}=\frac{1}{\det(A)}\mathrm{adj}(A)

So what does this mean for our system of equations? We can write

\displaystyle\begin{aligned}x&=\frac{1}{\det(A)}\mathrm{adj}(A)y\\x^i&=\frac{1}{\det(A)}A_j^iy^j\end{aligned}

But how does this sum A_j^iy^j differ from the one A_j^ia_i^j we used before (without summing on i) to calculate the determinant of A? We’ve replaced the ith column of A by the column vector y, and so this is just another determinant, taken after performing this replacement!

Here’s an example. Let’s say we’ve got a system written in matrix form

\displaystyle\begin{pmatrix}a&b\\c&d\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}=\begin{pmatrix}u\\v\end{pmatrix}

The entry in the ith row and jth column of the adjugate matrix is calculated by striking out the ith column and jth row of our original matrix, taking the determinant of the remaining matrix, and multiplying by (-1)^{i+j}. We get

\displaystyle\begin{pmatrix}d&-b\\-c&a\end{pmatrix}

and thus we find

\displaystyle\begin{pmatrix}x\\y\end{pmatrix}=\frac{1}{ad-bc}\begin{pmatrix}d&-b\\-c&a\end{pmatrix}\begin{pmatrix}u\\v\end{pmatrix}=\frac{1}{ad-bc}\begin{pmatrix}ud-bv\\av-uc\end{pmatrix}

where we note that

\displaystyle\begin{aligned}ud-bv&=\det\begin{pmatrix}u&b\\v&d\end{pmatrix}\\av-uc&=\det\begin{pmatrix}a&u\\c&v\end{pmatrix}\end{aligned}

In other words, our solution is given by ratios of determinants:

\displaystyle\begin{aligned}x&=\frac{\det\begin{pmatrix}u&b\\v&d\end{pmatrix}}{\det\begin{pmatrix}a&b\\c&d\end{pmatrix}}\\y&=\frac{\det\begin{pmatrix}a&u\\c&v\end{pmatrix}}{\det\begin{pmatrix}a&b\\c&d\end{pmatrix}}\end{aligned}

and similar formulae hold for larger systems of equations.

November 17, 2009 Posted by John Armstrong | Algebra, Linear Algebra | | 7 Comments

Another Lemma on Nonzero Jacobians

Sorry for the late post. I didn’t get a chance to get it up this morning before my flight.

Brace yourself. Just like last time we’ve got a messy technical lemma about what happens when the Jacobian determinant of a function is nonzero.

This time we’ll assume that f:X\rightarrow\mathbb{R}^n is not only continuous, but continuously differentiable on a region X\subseteq\mathbb{R}^n. We also assume that the Jacobian J_f(a)\neq0 at some point a\in X. Then I say that there is some neighborhood N of a so that f is injective on N.

First, we take n points \{z_i\}_{i=1}^n in X and make a function of them

\displaystyle h(z_1,\dots,z_n)=\det\left(\frac{\partial f^i}{\partial x^j}\bigg\vert_{x=z_i}\right)

That is, we take the jth partial derivative of the ith component function and evaluate it at the ith sample point to make a matrix \left(a_{ij}\right), and then we take the determinant of this matrix. As a particular value, we have

\displaystyle h(a,\dots,a)=J_f(a)\neq0

Since each partial derivative is continuous, and the determinant is a polynomial in its entries, this function is continuous where it’s defined. And so there’s some ball N of a so that if all the z_i are in N we have h(z_1,\dots,z_n)\neq0. We want to show that f is injective on N.

So, let’s take two points x and y in N so that f(x)=f(y). Since the ball is convex, the line segment [x,y] is completely contained within N\subseteq X, and so we can bring the mean value theorem to bear. For each component function we can write

\displaystyle0=f^i(y)-f^i(x)=df^i(\xi_i)(y-x)=\frac{\partial f^i}{\partial x^j}\bigg\vert_{\xi_i}(y^j-x^j)

for some \xi_i in [x,y]\subseteq N (no summation here on i). But like last time we now have a linear system of equations described by an invertible matrix. Here the matrix has determinant

\displaystyle\det\left(\frac{\partial f^i}{\partial x^j}\bigg\vert_{\xi_i}\right)=h(\xi_1,\dots,\xi_n)\neq0

which is nonzero because all the \xi_i are inside the ball N. Thus the only possible solution to the system of equations is x^i=y^i. And so if f(x)=f(y) for points within the ball N, we must have x=y, and thus f is injective.

November 17, 2009 Posted by John Armstrong | Analysis, Calculus | | 2 Comments

Sunday Samples 147

Okay, everybody knows David Bowie. The glitz and glam and all the characters that are no more escapable as an influence anymore than anything the Beatles or the Rolling Stones ever did. And we all know that Ziggy played guitar.

But before Ziggy Stardust, Bowie came up through the same blues and Merseybeat beginnings as a whole generation of British artists. It’s how he departs from that background where things get interesting. The late-’60s psychedelia influenced his still largely-acoustic sound in “Space Oddity”, released to coincide with the moon landing in 1969 and introducing the character of Major Tom, to whom he (and other artists) would return many times. On 1970’s The Man Who Sold the World he started working with the backing band who would eventually become the Spiders, and they pushed into a sound more like then-current British heavy metal artists Led Zeppelin and Black Sabbath.

While that style didn’t stick, the album also saw Bowie incorporating more avant garde artistic influences and references. These did resonate in the developing work, and 1971’s Hunky Dory saw Bowie plunging headlong towards the ignition of his glam rock career in the following year’s The Rise and Fall of Ziggy Stardust and the Spiders from Mars. Hunky Dory brought back the pop-singer sound from Space Oddity and blended it with everything from Brecht to Dylan to Warhol, and threw in, Buddhism, Lovecraftian themes, references to Nietzsche-infused esoteric and hermetic philosophies, and even some outright, self-proclaimed nonsense.

Smack in the middle of all of this comes “Quicksand”, which survives to this day as a concert staple.
Read more »

November 15, 2009 Posted by John Armstrong | Sunday Samples | | 2 Comments

A Lemma on Nonzero Jacobians

Okay, let’s dive right in with a first step towards proving the inverse function theorem we talked about at the end of yesterday’s post. This is going to get messy.

We start with a function f and first ask that it be continuous and injective on the closed ball \overline{K} of radius r around the point a. Then we ask that all the partial derivatives of f exist within the open interior K — note that this is weaker than our existence condition for the differential of f — and that the Jacobian determinant J_f(x)\neq0 on K. Then I say that the image f(K) actually contains a neighborhood of f(a). That is, the image doesn’t “flatten out” near a.

The boundary \partial K of the ball K is the sphere of radius r:

\displaystyle\partial K=\left\{x\in\mathbb{R}^n\vert\lVert x-a\rVert=r\right\}

Now the Heine-Borel theorem says that this sphere, being both closed and bounded, is a compact subset of \mathbb{R}^n. We’ll define a function on this sphere by

\displaystyle g(x)=\lVert f(x)-f(a)\rVert

which must be continuous and strictly positive, since if \lVert f(x)-f(a)\rVert=0 then f(x)=f(a), but we assumed that f is injective on \overline{K}. But we also know that the image of a continuous real-valued function on a compact, connected space must be a closed interval. That is, g(\partial K)=[m,M], and there exists some point x on the sphere where this minimum is actually attained: g(x)=m>0.

Now we’re going to let T be the ball of radius \frac{m}{2} centered at f(a). We will show that T\subseteq f(K), and is thus a neighborhood of f(a) contained within f(K). To this end, we’ll pick y\in T and show that y\in f(X).

So, given such a point y\in T, we define a new function on the closed ball \overline{K} by

\displaystyle h(x)=\lVert f(x)-y\rVert

This function is continuous on the compact ball \overline{K}, so it again has an absolute minimum. I say that it happens somewhere in the interior K.

At the center of the ball, we have h(a)=\lVert f(a)-y\rVert<\frac{m}{2} (since y\in T), so the minimum must be even less. But on the boundary \partial K, we find

\displaystyle\begin{aligned}h(x)&=\lVert f(x)-y\rVert\\&=\lVert f(x)-f(a)-(y-f(a))\rVert\\&\geq\lVert f(x)-f(a)\rVert-\lVert f(a)-y\rVert\\&>g(x)-\frac{m}{2}\geq\frac{m}{2}\end{aligned}

so the minimum can’t happen on the boundary. So this minimum of h happens at some point b in the open ball K, and so does the minimum of the square of h:

\displaystyle h(x)^2=\lVert f(x)-y\rVert^2=\sum\limits_{i=1}^n\left(f^i(x)-y^i\right)^2

Now we can vary each component x^i of x separately, and use Fermat’s theorem to tell us that the derivative in terms of x^i must be zero at the minimum value b^i. That is, each of the partial derivatives of h^2 must be zero (we’ll come back to this more generally later):

\displaystyle\frac{\partial}{\partial x^k}\left[\sum\limits_{i=1}^n\left(f^i(x)-y^i\right)^2\right]\Bigg\vert_{x=b}=\sum\limits_{i=1}^n2\left(f^i(b)-y^i\right)\frac{\partial f^i}{\partial x^k}\bigg\vert_{x=b}=0

This is the product of the vector 2(f(b)-y) by the matrix \left(\frac{\partial f^i}{\partial x^k}\right). And the determinant of this matrix is J_f(b): the Jacobian determinant at b\in K, which we assumed to be nonzero way back at the beginning! Thus the matrix must be invertible, and the only possible solution to this system of equations is for f(b)-y=0, and so y=f(b)\in f(K).

November 13, 2009 Posted by John Armstrong | Analysis, Calculus | | 3 Comments

The Jacobian of a Composition

Let’s start today by introducing some notation for the Jacobian determinant which we introduced yesterday. We’ll write the Jacobian determinant of a differentiable function f at a point x as J_f(x)=\det(df(x)). Or, in more of a Leibnizean style:

\displaystyle\frac{\partial(f^1,\dots,f^n)}{\partial(x^1,\dots,x^n)}=\det\left(\frac{\partial f^i}{\partial x^j}\right)

We’re interested in determining the Jacobian of the composite of two differentiable functions. To which end, suppose g:X\rightarrow\mathbb{R}^n and f:Y\rightarrow{R}^n are differentiable functions on two open regions X and Y in \mathbb{R}^n, with g(X)\subseteq Y, and let h=f\circ g:X\rightarrow\mathbb{R}^n be their composite. Then the chain rule tells us that

\displaystyle dh(x)=df(g(x))dg(x)

where each differential is an n\times n matrix, and the right-hand side is a matrix multiplication.

But these matrices are exactly the Jacobian matrices of the functions! And since the by definition, the determinant of the product of two matrices is the product of their determinants. That is, we find the equation

\displaystyle J_h(x)=J_f(g(x))J_g(x)

Or, we could define y^i=g^i(x) and use the Leibniz notation to write

\displaystyle\frac{\partial(h^1,\dots,h^n)}{\partial(x^1,\dots,x^n)}=\frac{\partial(h^1,\dots,h^n)}{\partial(y^1,\dots,y^n)}\frac{\partial(y^1,\dots,y^n)}{\partial(x^1,\dots,x^n)}

As a special case, let’s assume that the differentiable function f:X\rightarrow\mathbb{R}^n is injective in some open neighborhood A of a point a. That is, every x\in A is sent to a distinct point by f, making up the whole image f(A). Further, let’s suppose that the function f^{-1} which sends each point y\in f(A) back to the point in A from which it came — f^{-1}(y)=x if and only if y=f(x) — is also differentiable. Then we have the composition f^{-1}(f(x))=x, and thus we find

\displaystyle J_{f^{-1}}(f(a))J_f(a)=1

or

\displaystyle\frac{\partial(y^1,\dots,y^n)}{\partial(x^1,\dots,x^n)}\frac{\partial(x^1,\dots,x^n)}{\partial(y^1,\dots,y^n)}=1

Thus, if a differentiable function f has a differentiable inverse function defined in some neighborhood of a point a, then the Jacobian determinant of the function must be nonzero at that point. A fair bit of work will now be put to turning this statement around. That is, we seek to show that if the Jacobian determinant J_f(a)\neq0, then f has a differentiable inverse in some neighborhood of a.

November 12, 2009 Posted by John Armstrong | Analysis, Calculus | | 2 Comments