The Jacobian of a Composition
Let’s start today by introducing some notation for the Jacobian determinant which we introduced yesterday. We’ll write the Jacobian determinant of a differentiable function at a point
as
. Or, in more of a Leibnizean style:
We’re interested in determining the Jacobian of the composite of two differentiable functions. To which end, suppose and
are differentiable functions on two open regions
and
in
, with
, and let
be their composite. Then the chain rule tells us that
where each differential is an matrix, and the right-hand side is a matrix multiplication.
But these matrices are exactly the Jacobian matrices of the functions! And since the by definition, the determinant of the product of two matrices is the product of their determinants. That is, we find the equation
Or, we could define and use the Leibniz notation to write
As a special case, let’s assume that the differentiable function is injective in some open neighborhood
of a point
. That is, every
is sent to a distinct point by
, making up the whole image
. Further, let’s suppose that the function
which sends each point
back to the point in
from which it came —
if and only if
— is also differentiable. Then we have the composition
, and thus we find
or
Thus, if a differentiable function has a differentiable inverse function defined in some neighborhood of a point
, then the Jacobian determinant of the function must be nonzero at that point. A fair bit of work will now be put to turning this statement around. That is, we seek to show that if the Jacobian determinant
, then
has a differentiable inverse in some neighborhood of
.
The Jacobian
Now that we’ve used exterior algebras to come to terms with parallelepipeds and their transformations, let’s come back to apply these ideas to the calculus.
We’ll focus on a differentiable function , where
is itself some open region in
. That is, if we pick a basis
and coordinates of
, then the function
is a vector-valued function of
real variables
with components
. The differential, then, is itself a vector-valued function whose components are the differentials of the component functions:
. We can write these differentials out in terms of partial derivatives:
Just like we said when discussing the chain rule, the differential at the point defines a linear transformation from the
-dimensional space of displacement vectors at
to the
-dimensional space of displacement vectors at
, and the matrix entries with respect to the given basis are given by the partial derivatives.
It is this transformation that we will refer to as the Jacobian, or the Jacobian transformation. Alternately, sometimes the representing matrix is referred to as the Jacobian, or the Jacobian matrix. Since this matrix is square, we can calculate its determinant, which is also referred to as the Jacobian, or the Jacobian determinant. I’ll try to be clear which I mean, but often the specific referent of “Jacobian” must be sussed out from context.
So, in light of our recent discussion, what does the Jacobian determinant mean? Well, imagine starting with a -dimensional parallelepiped at the point
, with one side in each of the basis directions, and positively oriented. That is, it consists of the points
with
in the interval
for some fixed
. We’ll assume for the moment that this whole region lands within the region
. It should be clear that this parallelepiped is represented by the wedge
which clearly has volume given by the product of all the .
Now the function sends this cube to a sort of curvy parallelepiped, consisting of the points
, with each
in the interval
, and this image will have some volume. Unfortunately, we have no idea as yet how to measure such a volume. But we might be able to approximate it. Instead of using the actual curvy parallelepiped, we’ll build a new one. And if the
are small enough, it will be more or less the same set of points, with the same volume. Or at least close enough for our purposes. We’ll replace the curved path defined by
by the displacement vector between the two endpoints:
and use these new vectors to build a new parallelepiped
But this is still an awkward volume to work with. However, we can use the differential to approximate each of these differences
with no summation here on the index .
Now we can easily calculate the volume of this parallelepiped, represented by the wedge
which can be rewritten as
which clearly has a volume of — the volume of the original parallelepiped — times the Jacobian determinant. That is, the Jacobian determinant at
estimates the factor by which the function
expands small volumes near that point. Or it tells us that locally
reverses the orientation of small regions near the point if the Jacobian determinant is negative.
The Cross Product and Pseudovectors
Finally we can get to something that is presented to students in multivariable calculus and physics classes as if it were a basic operation: the cross product of three-dimensional vectors. This only works out because the Hodge star defines an isomorphism from to
when
. We define
All the usual properties of the cross product are really properties of the wedge product combined with the Hodge star. Geometrically, is defined as a vector perpendicular to the plane spanned by
and
, which is exactly what the Hodge star produces. We choose which perpendicular direction by the “right-hand rule”, but this is only because we choose the basis vectors
,
, and
(or as these classes often call them:
,
, and
) by the same convention, and this defines an orientation we have to stick with when we define the Hodge star. The length of the cross product is the area of the parallelogram spanned by
and
, again as expected from the Hodge star. Algebraically, the cross product is anticommutative and linear in each variable. These are properties of the wedge product, and the Hodge star — being linear — preserves them.
The biggest fib we tell students is that the value of the cross product is a vector. It certainly looks like a vector on the surface, but the problem is that it doesn’t transform like a vector. Before the advent of thinking of all these things geometrically, people thought of a vector quantity as a triple of real numbers that transform in a certain way when we change to a different orthonormal basis. This is inspired by the physical world, where there’s no magic orthonormal basis floating out somewhere to pick out coordinates. We should be able to turn our heads and translate the laws of physics to compensate exactly. These rotations form the special orthogonal group of orientation- and inner product-preserving transformations, but we can also throw in reflections to get the whole orthogonal group, of all transformations from one orthonormal basis to another.
So let’s imagine what happens to a cross product when we reflect the world. In fact, stand by a mirror and hold out your right hand in the familiar way, with your index finger along one imagined vector , your middle finger along another vector
, and your thumb pointing in the direction of the cross product
. Now look in the mirror.
The orientation has been reversed, and mirror-you is holding out its left hand! If mirror-you tried to use its version of the cross product, it would find that the cross product should go in the other direction. The cross product doesn’t behave like all the other vectors in the world, because it doesn’t reflect the same way.
Physicists to this day use the old language describing a triple of real numbers that transform like a vector under rotations, but point the wrong way under reflections. They call such a quantity a “pseudovector”. And they also have a word for a single real number that somehow mysteriously flips its sign when we apply a reflection: a “pseudoscalar”. Whenever we read about scalar, vector, pseudovector, and pseudoscalar quantities, they just mean real numbers (or triples of them) and specify how they change under certain orthogonal transformations.
But geometrically we can see exactly what’s going on. These are just the spaces ,
,
, and
, along with their representations of the orthogonal group
. And the “pseudo” means we’ve used the Hodge star — which depends essentially on a choice of orientation — to pretend that bivectors in
and trivectors in
are just like vectors in
and scalars in
, respectively. And we can get away with it for a long time, until a mirror shows up.
The only essential tool from multivariable calculus or introductory physics built from the cross product that we might have need of is the “triple scalar product”, which takes three vectors ,
, and
. It calculates the cross product
of two of them, and then the inner product
with the third to get a scalar. But this is the coefficient of our unit cube
in the definition of the Hodge star:
since . That is, the triple scalar product gives the (oriented) volume of the parallelepiped spanned by
,
, and
, just as we remember from those classes. We really don’t need the cross product as a primitive operation at all, and in the long run it only leads to confusion as it identifies vectors and pseudovectors without the explicit use of the orientation-dependent Hodge star to keep us straight.
The Hodge Star
Sorry for the delay from last Friday to today, but I was chasing down a good lead.
Anyway, last week I said that I’d talk about a linear map that extends the notion of the correspondence between parallelograms in space and perpendicular vectors.
First of all, we should see why there may be such a correspondence. We’ve identified -dimensional parallelepipeds in an
-dimensional vector space
with antisymmetric tensors of degree
:
. Of course, not every such tensor will correspond to a parallelepiped (some will be linear combinations that can’t be written as a single wedge of
vectors), but we’ll just keep going and let our methods apply to such more general tensors. Anyhow, we also know how to count the dimension of the space of such tensors:
This formula tells us that and
will have the exact same dimension, and so it makes sense that there might be an isomorphism between them. And we’re going to look for one which defines the “perpendicular”
-dimensional parallelepiped with the same size.
So what do we mean by “perpendicular”? It’s not just in terms of the “angle” defined by the inner product. Indeed, in that sense the parallelograms and
are perpendicular. No, we want any vector in the subspace defined by our parallelepiped to be perpendicular to any vector in the subspace defined by the new one. That is, we want the new parallelepiped to span the orthogonal complement to the subspace we start with.
Our definition will also need to take into account the orientation on . Indeed, considering the parallelogram
in three-dimensional space, the perpendicular must be
for some nonzero constant
, or otherwise it won’t be perpendicular to the whole
–
plane. And
has to be
in order to get the right size. But will it be
or
? The difference is entirely in the orientation.
Okay, so let’s pick an orientation on , which gives us a particular top-degree tensor
so that
. Now, given some
, we define the Hodge dual
to be the unique antisymmetric tensor of degree
satisfying
for all . Notice here that if
and
describe parallelepipeds, and any side of
is perpendicular to all the sides of
, then the projection of
onto the subspace spanned by
will have zero volume, and thus
. This is what we expect, for then this side of
must lie within the perpendicular subspace spanned by
, and so the wedge
should also be zero.
As a particular example, say we have an orthonormal basis of
so that
. Then given a multi-index
the basic wedge
gives us the subspace spanned by the vectors
. The orthogonal complement is clearly spanned by the remaining basis vectors
, and so
, with the sign depending on whether the list
is an even or an odd permutation of
.
To be even more explicit, let’s work these out for the cases of dimensions three and four. First off, we have a basis . We work out all the duals of basic wedges as follows:
This reconstructs the correspondence we had last week between basic parallelograms and perpendicular basis vectors. In the four-dimensional case, the basis leads to the duals
It’s not a difficult exercise to work out the relation for a degree
tensor in an
-dimensional space.
An Example of a Parallelogram
Today I want to run through an example of how we use our new tools to read geometric information out of a parallelogram.
I’ll work within with an orthonormal basis
and an identified origin
to give us a system of coordinates. That is, given the point
, we set up a vector
pointing from
to
(which we can do in a Euclidean space). Then this vector has components in terms of the basis:
and we’ll write the point as
.
So let’s pick four points: ,
,
, and
. These four point do, indeed, give the vertices of a parallelogram, since both displacements from
to
and from
to
are
, and similarly the displacements from
to
and from
to
are both
. Alternatively, all four points lie within the plane described by
, and the region in this plane contained between the vertices consists of points
so that
for some and
both in the interval
. So this is a parallelogram contained between
and
. Incidentally, note that the fact that all these points lie within a plane means that any displacement vector between two of them is in the kernel of some linear transformation. In this case, it’s the linear functional
, and the vector
is perpendicular to any displacement in this plane, which will come in handy later.
Now in a more familiar approach, we might say that the area of this parallelogram is its base times its height. Let’s work that out to check our answer against later. For the base, we take the length of one vector, say . We use the inner product to calculate its length as
. For the height we can’t just take the length of the other vector. Some basic trigonometry shows that we need the length of the other vector (which is again
) times the sine of the angle between the two vectors. To calculate this angle we again use the inner product to find that its cosine is
, and so its sine is
. Multiplying these all together we find a height of
, and thus an area of
.
On the other hand, let’s use our new tools. We represent the parallelogram as the wedge — incidentally choosing an orientation of the parallelogram and the entire plane containing it — and calculate its length using the inner product on the exterior algebra:
Alternately, we could calculate it by expanding in terms of basic wedges. That is, we can write
This tells us that if we take our parallelogram and project it onto the –
plane (which has an orthonormal basis
) we get an area of
. Similarly, projecting our parallelogram onto the
–
plane (with orthonormal basis
we get an area of
. That is, the area is
and the orientation of the projected parallelogram disagrees with that of the plane. Anyhow, now the squared area of the parallelogram is the sum of the squares of these projected areas:
.
Notice, now, the similarity between this expression and the perpendicular vector we found before:
. Each one is the sum of three terms with the same choices of signs. The terms themselves seem to have something to do with each other as well; the wedge
describes an area in the
–
plane, while
describes a length in the perpendicular
-axis. Similarly,
describes an area in the
–
plane, while
describes a length in the perpendicular
-axis. And, magically, the sum of these three perpendicular vectors to these three parallelograms gives the perpendicular vector to their sum!
There is, indeed, a linear correspondence between parallelograms and vectors that extends this idea, which we will explore tomorrow. The seemingly-odd choice of to correspond to
, though, should be a tip-off that this correspondence is closely bound up with the notion of orientation.
Parallelepipeds and Volumes III
So, why bother with this orientation stuff, anyway? We’ve got an inner product on spaces of antisymmetric tensors, and that should give us a concept of length. Why can’t we just calculate the size of a parallelepiped by sticking it into this bilinear form twice?
Well, let’s see what happens. Given a -dimensional parallelepiped with sides
through
, we represent the parallelepiped by the wedge
. Then we might try defining the volume by using the renormalized inner product
Let’s expand one copy of the wedge out in terms of our basis of wedges of basis vectors
where the multi-index runs over all increasing
-tuples of indices
. But we already know that
, and so this is squared-volume is the sum of the squares of these components, just like we’re familiar with. Then we can define the
-volume of the parallelepiped as the square root of this sum.
Let’s look specifically at what happens for top-dimensional parallelepipeds, where . Then we only have one possible multi-index
, with coefficient
and so our formula reads
So we get the magnitude of the volume without having to worry about choosing an orientation. Why even bother?
Because we already do care about orientation. Let’s go all the way back to one-dimensional parallelepipeds, which are just described by vectors. A vector doesn’t just describe a certain length, it describes a length along a certain line in space. And it doesn’t just describe a length along that line, it describes a length in a certain direction along that line. A vector picks out three things:
- A one-dimensional subspace
of the ambient space
.
- An orientation of the subspace
.
- A volume (length) of this oriented subspace.
And just like vectors, nondegenerate -dimensional parallelepipeds pick out three things
- A
-dimensional subspace
of the ambient space
.
- An orientation of the subspace
.
- A
-dimensional volume of this oriented subspace.
The difference is that when we get up to the top dimension the space itself can have its own orientation, which may or may not agree with the orientation induced by the parallelepiped. We don’t always care about this disagreement, and we can just take the absolute value to get rid of a sign if we don’t care, but it might come in handy.
Parallelepipeds and Volumes II
Yesterday we established that the -dimensional volume of a parallelepiped with
sides should be an alternating multilinear functional of those
sides. But now we want to investigate which one.
The universal property of spaces of antisymmetric tensors says that any such functional corresponds to a unique linear functional . That is, we take the parallelepiped with sides
through
and represent it by the antisymmetric tensor
. Notice, in particular, that if the parallelepiped is degenerate then this tensor is
, as we hoped. Then volume is some linear functional that takes in such an antisymmetric tensor and spits out a real number. But which linear functional?
I’ll start by answering this question for -dimensional parallelepipeds in
-dimensional space. Such a parallelepiped is represented by an antisymmetric tensor with the
sides as its tensorands. But we’ve calculated the dimension of the space of such tensors:
. That is, once we represent these parallelepipeds by antisymmetric tensors there’s only one parameter left to distinguish them: their volume. So if we specify the volume of one parallelepiped linearity will take care of all the others.
There’s one parallelepiped whose volume we know already. The unit -cube must have unit volume. So, to this end, pick an orthonormal basis
. A parallelepiped with these sides corresponds to the antisymmetric tensor
, and the volume functional must send this to
. But be careful! The volume doesn’t depend just on the choice of basis, but on the order of the basis elements. Swap two of the basis elements and we should swap the sign of the volume. So we’ve got two different choices of volume functional here, which differ exactly by a sign. We call these two choices “orientations” on our vector space.
This is actually not as esoteric as it may seem. Almost all introductions to vectors — from multivariable calculus to vector-based physics — talk about “left-handed” and “right-handed” coordinate systems. These differ by a reflection, which would change the signs of all parallelepipeds. So we must choose one or the other, and choose which unit cube will have volume and which will have volume
. The isomorphism from
to
then gives us a “volume form”
, which will give us the volume of a parallelepiped represented by a given top-degree wedge.
Once we’ve made that choice, what about general parallelepipeds? If we have sides — written in components as
— we represent the parallelepiped by the wedge
. This is the image of our unit cube under the transformation sending
to
, and so we find
The volume of the parallelepiped is the determinant of this transformation.
Incidentally, this gives a geometric meaning to the special orthogonal group . Orthogonal transformations send orthonormal bases to other orthonormal bases, which will send unit cubes to other unit cubes. But the determinant of an orthogonal transformation may be either
or
. Transformations of the first kind make up the special orthogonal group, while transformations of the second kind send “positive” unit cubes to “negative” ones, and vice-versa. That is, they involve some sort of reflection, swapping the choice of orientation we made above. Special orthogonal transformations are those which preserve not only lengths and angles, but the orientation of the space. More generally, there is a homomorphism
sending a transformation to the sign of its determinant. Transformations with positive determinant are said to be “orientation-preserving”, while those with negative determinant are said to be “orientation-reversing”.
Parallelepipeds and Volumes I
And we’re back with more of what Mr. Martinez of Harvard’s Medical School assures me is onanism of the highest caliber. I’m sure he, too, blames me for not curing cancer.
Coming up in our study of calculus in higher dimensions we’ll need to understand parallelepipeds, and in particular their volumes. First of all, what is a parallelepiped? Or, more specifically, what is a -dimensional parallelepiped in
-dimensional space? It’s a collection of points in space that we can describe as follows. Take a point
and
vectors
in
. The parallelepiped is the collection of points reachable by moving from
by some fraction of each of the vectors
. That is, we pick
values
, each in the interval
, and use them to specify the point
. The collection of all such points is the parallelepiped with corner
and sides
.
One possible objection is that these sides may not be linearly independent. If the sides are linearly independent, then they span a -dimensional subspace of the ambient space, justifying our calling it
-dimensional. But if they’re not, then the subspace they span has a lower dimension. We’ll deal with this by calling such a parallelepiped “degenerate”, and the nice ones with linearly independent sides “nondegenerate”. Trust me, things will be more elegant in the long run if we just deal with them both on the same footing.
Now we want to consider the volume of a parallelepiped. The first observation is that the volume doesn’t depend on the corner point . Indeed, we should be able to slide the corner around to any point in space as long as we bring the same displacement vectors along with us. So the volume should be a function only of the sides.
The second observation is that as a function of the sides, the volume function should commute with scalar multiplication in each variable separately. That is, if we multiply by a non-negative factor of
, then we multiply the whole volume of the parallelepiped by
as well. But what about negative scaling factors? What if we reflect the side (and thus the whole parallelepiped) to point the other way? One answer might be that we get the same volume, but it’s going to be easier (and again more elegant) if we say that the new parallelepiped has the negative of the original one’s volume.
Negative volume? What could that mean? Well, we’re going to move away from the usual notion of volume just a little. Instead, we’re going to think of “signed” volume, which includes the possibility of being positive or negative. By itself, this sign will be less than clear at first, but we’ll get a better understanding as we go. As a first step we’ll say that two parallelepipeds related by a reflection have opposite signs. This won’t only cover the above behavior under scaling sides, but also what happens when we exchange the order of two sides. For example, the parallelogram with sides and
and the parallelogram with sides
and
have the same areas with opposite signs. Similarly, swapping the order of two sides in a given parallelepiped will flip its sign.
The third observation is that the volume function should be additive in each variable. One way to see this is that the -dimensional volume of the parallelepiped with sides
through
should be the product of the
-dimensional volume of the parallelepiped with sides
through
and the length of the component of
perpendicular to all the other sides, and this length is a linear function of
. Since there’s nothing special here about the last side, we could repeat the argument with the other sides.
The other way to see this fact is to consider the following diagram, helpfully supplied by Kate from over at f(t):
The side of one parallelogram is the (vector) sum of the sides of the other two, and we can see that the area of the one parallelogram is the sum of the areas of the other two. This justifies the assertion that for parallelograms in the plane, the area is additive as a function of one side (and, similarly, of the other). Similar diagrams should be apparent to justify the assertion for higher-dimensional parallelepipeds in higher-dimensional spaces.
Putting all these together, we find that the -dimensional volume of a parallelepiped with
sides is an alternating multilinear functional, with the
sides as variables, and so it lives somewhere in the exterior algebra
. We’ll have to work out which particular functional gives us a good notion of volume as we continue.