The Unapologetic Mathematician

Mathematics for the interested outsider

A Trace Criterion for Nilpotence

We’re going to need another way of identifying nilpotent endomorphisms. Let A\subseteq B\subseteq\mathfrak{gl}(V) be two subspaces of endomorphisms on a finite-dimensional space V, and let M be the collection of x\in\mathfrak{gl}(V) such that \mathrm{ad}(x) sends B into A. If x\in M satisfies \mathrm{Tr}(xy)=0 for all y\in M then x is nilpotent.

The first thing we do is take the Jordan-Chevalley decomposition of xx=s+n — and fix a basis that diagonalizes x with eigenvalues a_i. We define E to be the \mathbb{Q}-subspace of \mathbb{F} spanned by the eigenvalues. If we can prove that this space is trivial, then all the eigenvalues of s must be zero, and thus s itself must be zero.

We proceed by showing that any linear functional f:E\to\mathbb{Q} must be zero. Taking one, we define y\in\mathfrak{gl}(V) to be the endomorphism whose matrix with respect to our fixed basis is diagonal: f(a_i)\delta_{ij}. If \{e_{ij}\} is the corresponding basis of \mathfrak{gl}(V) we can calculate that

\displaystyle\begin{aligned}\left[\mathrm{ad}(s)\right](e_{ij})&=(a_i-a_j)e_{ij}\\\left[\mathrm{ad}(y)\right](e_{ij})&=(f(a_i)-f(a_j))e_{ij}\end{aligned}

Now we can find some polynomial r(T) such that r(a_i-a_j)=f(a_i)-f(a_j); there is no ambiguity here since if a_i-a_j=a_k-a_l then the linearity of f implies that

\displaystyle\begin{aligned}f(a_i)-f(a_j)&=f(a_i-a_j)\\&=f(a_k-a_l)\\&=f(a_k)-f(a_l)\end{aligned}

Further, picking i=j we can see that r(0)=0, so r has no constant term. It should be apparent that \mathrm{ad}(y)=r\left(\mathrm{ad}(s)\right).

Now, we know that \mathrm{ad}(s) is the semisimple part of \mathrm{ad}(x), so the Jordan-Chevalley decomposition lets us write it as a polynomial in \mathrm{ad}(x) with no constant term. But then we can write \mathrm{ad}(y)=r\left(p\left(\mathrm{ad}(x)\right)\right). Since \mathrm{ad}(x) maps B into A, so does \mathrm{ad}(y), and our hypothesis tells us that

\displaystyle\mathrm{Tr}(xy)=\sum\limits_{i=1}^{\dim V}a_if(a_i)=0

Hitting this with f we find that the sum of the squares of the f(a_i) is also zero, but since these are rational numbers they must all be zero.

Thus, as we asserted, the only possible \mathbb{Q}-linear functional on E is zero, meaning that E is trivial, all the eigenvalues of s are zero, and x is nipotent, as asserted.

August 31, 2012 Posted by | Algebra, Lie Algebras, Linear Algebra | 1 Comment

Uses of the Jordan-Chevalley Decomposition

Now that we’ve given the proof, we want to mention a few uses of the Jordan-Chevalley decomposition.

First, we let A be any finite-dimensional \mathbb{F}-algebra — associative, Lie, whatever — and remember that \mathrm{End}_\mathbb{F}(A) contains the Lie algebra of derivations \mathrm{Der}(A). I say that if \delta\in\mathrm{Der}(A) then so are its semisimple part \sigma and its nilpotent part \nu; it’s enough to show that \sigma is.

Just like we decomposed V in the proof of the Jordan-Chevalley decomposition, we can break A down into the eigenspaces of \delta — or, equivalently, of \sigma. But this time we will index them by the eigenvalue: A_a consists of those x\in A such that \left[\delta-aI\right]^k(x)=0 for sufficiently large k.

Now we have the identity:

\displaystyle\left[\delta-(a+b)I\right]^n(xy)=\sum\limits_{i=0}^n\binom{n}{i}\left[\delta-aI\right]^{n-i}(x)\left[\delta-bI\right]^i(y)

which is easily verified. If a sufficiently large power of \delta-aI applied to x and a sufficiently large power of \delta-bI applied to y are both zero, then for sufficiently large n one or the other factor in each term will be zero, and so the entire sum is zero. Thus we verify that A_aA_b\subseteq A_{a+b}.

If we take x\in A_a and y\in A_b then xy\in A_{a+b}, and thus \sigma(xy)=(a+b)xy. On the other hand,

\displaystyle\begin{aligned}\sigma(x)y+x\sigma(y)&=axy+bxy\\&=(a+b)xy\end{aligned}

And thus \sigma satisfies the derivation property

\displaystyle\sigma(xy)=\sigma(x)y+x\sigma(y)

so \sigma and \nu are both in \mathrm{Der}(A).

For the other side we note that, just as the adjoint of a nilpotent endomorphism is nilpotent, the adjoint of a semisimple endomorphism is semisimple. Indeed, if \{v_i\}_{i=0}^n is a basis of V such that the matrix of x is diagonal with eigenvalues \{a_i\}, then we let e_{ij} be the standard basis element of \mathfrak{gl}(n,\mathbb{F}), which is isomorphic to \mathfrak{gl}(V) using the basis \{v_i\}. It’s a straightforward calculation to verify that

\displaystyle\left[\mathrm{ad}(x)\right](e_{ij})=(a_i-a_j)e_{ij}

and thus \mathrm{ad}(x) is diagonal with respect to this basis.

So now if x=x_s+x_n is the Jordan-Chevalley decomposition of x, then \mathrm{ad}(x_s) is semisimple and \mathrm{ad}(x_n) is nilpotent. They commute, since

\displaystyle\begin{aligned}\left[\mathrm{ad}(x_s),\mathrm{ad}(x_n)\right]&=\mathrm{ad}\left([x_s,x_n]\right)\\&=\mathrm{ad}(0)=0\end{aligned}

Since \mathrm{ad}(x)=\mathrm{ad}(x_s)+\mathrm{ad}(x_n) is the decomposition of \mathrm{ad}(x) into a semisimple and a nilpotent part which commute with each other, it is the Jordan-Chevalley decomposition of \mathrm{ad}(x).

August 30, 2012 Posted by | Algebra, Lie Algebras, Linear Algebra | 3 Comments

The Jordan-Chevalley Decomposition (proof)

We now give the proof of the Jordan-Chevalley decomposition. We let x have distinct eigenvalues \{a_i\}_{i=1}^k with multiplicities \{m_i\}_{i=1}^k, so the characteristic polynomial of x is

\displaystyle\prod\limits_{i=1}^k(T-a_i)^{m_i}

We set V_i=\mathrm{Ker}\left((x-a_iI)^{m_i}\right) so that V is the direct sum of these subspaces, each of which is fixed by x.

On the subspace V_i, x has the characteristic polynomial (T-a_i)^{m_i}. What we want is a single polynomial p(T) such that

\displaystyle\begin{aligned}p(T)&\equiv a_i\mod (T-a_i)^{m_i}\\p(T)&\equiv0\mod T\end{aligned}

That is, p(T) has no constant term, and for each i there is some k_i(T) such that

\displaystyle p(T)=(T-a_i)^{m_i}k_i(T)+a_i

Thus, if we evaluate p(x) on the V_i block we get a_i.

To do this, we will make use of a result that usually comes up in number theory called the Chinese remainder theorem. Unfortunately, I didn’t have the foresight to cover number theory before Lie algebras, so I’ll just give the statement: any system of congruences — like the one above — where the moduli are relatively prime — as they are above, unless 0 is an eigenvalue in which case just leave out the last congruence since we don’t need it — has a common solution, which is unique modulo the product of the separate moduli. For example, the system

\displaystyle\begin{aligned}x&\equiv2\mod3\\x&\equiv3\mod4\\x&\equiv1\mod5\end{aligned}

has the solution 11, which is unique modulo 3\cdot4\cdot5=60. This is pretty straightforward to understand for integers, but it works as stated over any principal ideal domain — like \mathbb{F}[T] — and, suitably generalized, over any commutative ring.

So anyway, such a p exists, and it’s the p we need to get the semisimple part of x. Indeed, on any block V_i x_s=p(x) differs from x by stripping off any off-diagonal elements. Then we can just set q(T)=T-p(T) and find x_n=q(x). Any two polynomials in x must commute — indeed we can simply calculate

\displaystyle\begin{aligned}x_sx_n&=p(x)q(x)\\&=q(x)p(x)\\&=x_nx_s\end{aligned}

Finally, if x:B\to A then so must any polynomial in x, so the last assertion of the decomposition holds.

The only thing left is the uniqueness of the decomposition. Let’s say that x=s+n is a different decomposition into a semisimple and a nilpotent part which commute with each other. Then we have x_s-s=n-x_n, and all four of these endomorphisms commute with each other. But the left-hand side is semisimple — diagonalizable — but the right hand side is nilpotent, which means its only possible eigenvalue is zero. Thus s=x_s and n=x_n.

August 28, 2012 Posted by | Algebra, Linear Algebra | 1 Comment

The Jordan-Chevalley Decomposition

We recall that any linear endomorphism of a finite-dimensional vector space over an algebraically closed field can be put into Jordan normal form: we can find a basis such that its matrix is the sum of blocks that look like

\displaystyle\begin{pmatrix}\lambda&1&&&{0}\\&\lambda&1&&\\&&\ddots&\ddots&\\&&&\lambda&1\\{0}&&&&\lambda\end{pmatrix}

where \lambda is some eigenvalue of the transformation. We want a slightly more abstract version of this, and it hinges on the idea that matrices in Jordan normal form have an obvious diagonal part, and a bunch of entries just above the diagonal. This off-diagonal part is all in the upper-triangle, so it is nilpotent; the diagonalizable part we call “semisimple”. And what makes this particular decomposition special is that the two parts commute. Indeed, the block-diagonal form means we can carry out the multiplication block-by-block, and in each block one factor is a constant multiple of the identity, which clearly commutes with everything.

More generally, we will have the Jordan-Chevalley decomposition of an endomorphism: any x\in\mathrm{End}(V) can be written uniquely as the sum x=x_s+x_n, where x_s is semisimple — diagonalizable — and x_n is nilpotent, and where x_s and x_n commute with each other.

Further, we will find that there are polynomials p(T) and q(T) — each of which with no constant term — such that p(x)=x_s and q(x)=x_n. And thus we will find that any endomorphism that commutes with x with also commute with both x_s and x_n.

Finally, if A\subseteq B\subseteq V is any pair of subspaces such that x:B\to A then the same is true of both x_s and x_n.

We will prove these next time, but let’s see that this is actually true of the Jordan normal form. The first part we’ve covered.

For the second, set aside the assertion about p and q; any endomorphism commuting with x either multiplies each block by a constant or shuffles similar blocks, and both of these operations commute with both x_n and x_n.

For the last part, we may as well assume that B=V, since otherwise we can just restrict to x\vert_B\in\mathrm{End}(B). If \mathrm{Im}(x)\subseteq A then the Jordan normal form shows us that any complementary subspace to A must be spanned by blocks with eigenvalue 0. In particular, it can only touch the last row of any such block. But none of these rows are in the range of either the diagonal or off-diagonal portions of the matrix.

August 28, 2012 Posted by | Algebra, Linear Algebra | 3 Comments

Invariant Forms

A very useful structure to have on a complex vector space V carrying a representation \rho of a group G is an “invariant form”. To start with, this is a complex inner product (v,w)\mapsto\langle v,w\rangle, which we recall means that it is

  • linear in the second slot — \langle u,av+bw\rangle=a\langle u,v\rangle+b\langle u,w\rangle
  • conjugate symmetric — \langle v,w\rangle=\overline{\langle w,v\rangle}
  • positive definite — \langle v,v\rangle>0 for all v\neq0

Again as usual these imply conjugate linearity in the first slot, so the form isn’t quite bilinear. Still, people are often sloppy and say “invariant bilinear form”.

Anyhow, now we add a new condition to the form. We demand that it be

  • invariant under the action of G\langle gv,gw\rangle=\langle v,w\rangle

Here I have started to write gv as shorthand for \rho(g)v. We will only do this when the representation in question is clear from the context.

The inner product gives us a notion of length and angle. Invariance now tells us that these notions are unaffected by the action of G. That is, the vectors v and gv have the same length for all v\in V and g\in G. Similarly, the angle between vectors v and w is exactly the same as the angle between gv and gw. Another way to say this is that if the form B is invariant for the representation \rho:G\to GL(V), then the image of \rho is actually contained in the orthogonal group [commenter Eric Finster, below, reminds me that since we’ve got a complex inner product we’re using the group of unitary transformations with respect to the inner product B: \rho:G\to U(V,B)].

More important than any particular invariant form is this: if we have an invariant form on our space V, then any reducible representation is decomposable. That is, if W\subseteq V is a submodule, we can find another submodule U\subseteq V so that V=U\oplus W as G-modules.

If we just consider them as vector spaces, we already know this: the orthogonal complement W^\perp=\left\{v\in V\vert\forall w\in W,\langle v,w\rangle=0\right\} is exactly the subspace we need, for V=W\oplus W^\perp. I say that if W is a G-invariant subspace of V, then W^\perp is as well, and so they are both submodules. Indeed, if v\in W^\perp, then we check that gv is as well:

\displaystyle\begin{aligned}\langle gv,w\rangle&=\langle g^{-1}gv,g^{-1}w\rangle\\&=\langle v,g^{-1}w\\&=0\end{aligned}

where the first equality follows from the G-invariance of our form; the second from the representation property; and the third from the fact that W is an invariant subspace, so g^{-1}w\in W.

So in the presence of an invariant form, all finite-dimensional representations are “completely reducible”. That is, they can be decomposed as the direct sum of a number of irreducible submodules. If the representation V is irreducible to begin with, we’re done. If not, it must have some submodule W. Then the orthogonal complement W^\perp is also a submodule, and we can write V=W\oplus W^\perp. Then we can treat both W and W^\perp the same way. The process must eventually bottom out, since each of W and W^\perp have dimension smaller than that of V, which was finite to begin with. Each step brings the dimension down further and further, and it must stop by the time it reaches 1.

This tells us, for instance, that there can be no inner product on \mathbb{C}^2 that is invariant under the representation of the group of integers \mathbb{Z} we laid out at the end of last time. Indeed, that was an example of a reducible representation that is not decomposable, but if there were an invariant form it would have to decompose.

September 27, 2010 Posted by | Algebra, Group theory, Linear Algebra, Representation Theory | 7 Comments

Topological Vector Spaces, Normed Vector Spaces, and Banach Spaces

Before we move on, we want to define some structures that blend algebraic and topological notions. These are all based on vector spaces. And, particularly, we care about infinite-dimensional vector spaces. Finite-dimensional vector spaces are actually pretty simple, topologically. For pretty much all purposes you have a topology on your base field \mathbb{F}, and the vector space (which is isomorphic to \mathbb{F}^n for some n) will get the product topology.

But for infinite-dimensional spaces the product topology is often not going to be particularly useful. For example, the space of functions f:X\to\mathbb{R} is a product; we write f\in\mathbb{R}^X to mean the product of one copy of \mathbb{R} for each point in X. Limits in this topology are “pointwise” limits of functions, but this isn’t always the most useful way to think about limits of functions. The sequence

\displaystyle f_n=n\chi_{\left[0,\frac{1}{n}\right]}

converges pointwise to a function f(x)=0 for n\neq0 and f(0)=\infty. But we will find it useful to be able to ignore this behavior at the one isolated point and say that f_n\to0. It’s this connection with spaces of functions that brings such infinite-dimensional topological vector spaces into the realm of “functional analysis”.

Okay, so to get a topological vector space, we take a vector space and put a (surprise!) topology on it. But not just any topology will do: Remember that every point in a vector space looks pretty much like every other one. The transformation u\mapsto u+v has an inverse u\mapsto u-v, and it only makes sense that these be homeomorphisms. And to capture this, we put a uniform structure on our space. That is, we specify what the neighborhoods are of 0, and just translate them around to all the other points.

Now, a common way to come up with such a uniform structure is to define a norm on our vector space. That is, to define a function v\mapsto\lVert v\rVert satisfying the three axioms

  • For all vectors v and scalars c, we have \lVert cv\rVert=\lvert c\rvert\lVert v\rVert.
  • For all vectors v and w, we have \lVert v+w\rVert\leq\lVert v\rVert+\lVert w\rVert.
  • The norm \lVert v\rVert is zero if and only if the vector v is the zero vector.

Notice that we need to be working over a field in which we have a notion of absolute value, so we can measure the size of scalars. We might also want to do away with the last condition and use a “seminorm”. In any event, it’s important to note that though our earlier examples of norms all came from inner products we do not need an inner product to have a norm. In fact, there exist norms that come from no inner product at all.

So if we define a norm we get a “normed vector space”. This is a metric space, with a metric function defined by d(u,v)=\lVert u-v\rVert. This is nice because metric spaces are first-countable, and thus sequential. That is, we can define the topology of a (semi-)normed vector space by defining exactly what it means for a sequence of vectors to converge, and in particular what it means for them to converge to zero.

Finally, if we’ve got a normed vector space, it’s a natural question to ask whether or not this vector space is complete or not. That is, we have all the pieces in place to define Cauchy sequences in our vector space, and we would like for all of these sequences to converge under our uniform structure. If this happens — if we have a complete normed vector space — we call our structure a “Banach space”. Most of the spaces we’re concerned with in functional analysis are Banach spaces.

Again, for finite-dimensional vector spaces (at least over \mathbb{R} or \mathbb{C}) this is all pretty easy; we can always define an inner product, and this gives us a norm. If our underlying topological field is complete, then the vector space will be as well. Even without considering a norm, convergence of sequences is just given component-by-component. But infinite-dimensional vector spaces get hairier. Since our algebraic operations only give us finite sums, we have to take some sorts of limits to even talk about most vectors in the space in the first place, and taking limits of such vectors could just complicate things further. Studying these interesting topologies and seeing how linear algebra — the study of vector spaces and linear transformations — behaves in the infinite-dimensional context is the taproot of functional analysis.

May 12, 2010 Posted by | Algebra, Analysis, Functional Analysis, Linear Algebra, Measure Theory, Topology | 9 Comments

A Lemma on Reflections

Here’s a fact we’ll find useful soon enough as we talk about reflections. Hopefully it will also help get back into thinking about linear transformations and inner product spaces. However, if the linear algebra gets a little hairy (or if you’re just joining us) you can just take this fact as given. Remember that we’re looking at a real vector space V equipped with an inner product \langle\underline{\hphantom{X}},\underline{\hphantom{X}}\rangle.

Now, let’s say \Phi is some finite collection of vectors which span V (it doesn’t matter if they’re linearly independent or not). Let \sigma be a linear transformation which leaves \Phi invariant. That is, if we pick any vector \phi\in\Phi then the image \sigma(\phi) will be another vector in \Phi. Let’s also assume that there is some n-1-dimensional subspace P which \sigma leaves completely untouched. That is, \sigma(v)=v for every v\in P. Finally, say that there’s some \alpha\in\Phi so that \sigma(\alpha)=-\alpha (clearly \alpha\notin P) and also that \Phi is invariant under \sigma_\alpha. Then I say that \sigma=\sigma_\alpha and P=P_\alpha.

We’ll proceed by actually considering the transformation \tau=\sigma\sigma_\alpha, and showing that this is the identity. First off, \tau definitely fixes \alpha, since

\displaystyle\tau(\alpha)=\sigma\left(\sigma_\alpha(\alpha)\right)=\sigma(-\alpha)=-(-\alpha)=\alpha

so \tau acts as the identity on the line \mathbb{R}\alpha. In fact, I assert that \tau also acts as the identity on the quotient space V/\mathbb{R}\alpha. Indeed, \sigma_\alpha acts trivially on P_\alpha, and every vector in V/\mathbb{R}\alpha has a unique representative in P_\alpha. And then \sigma acts trivially on P, and every vector in V/\mathbb{R}\alpha has a unique representative in P.

This does not, however, mean that \tau acts trivially on any given complement of \mathbb{R}\alpha. All we really know at this point is that for every v\in V the difference between v and \tau(v) is some scalar multiple of \alpha. On the other hand, remember how we found upper-triangular matrices before. This time we peeled off one vector and the remaining transformation was the identity on the remaining n-1-dimensional space. This tells us that all of our eigenvalues are {1}, and the characteristic polynomial is (T-1)^n, where n=\dim(V). We can evaluate this on the transformation \tau to find that (\tau-1)^n=0

Now let’s try to use the collection of vectors \Phi. We assumed that both \sigma and \sigma_\alpha send vectors in \Phi back to other vectors in \Phi, and so the same must be true of \tau. But there are only finitely many vectors (say k of them) in \Phi to begin with, so \tau must act as some sort of permutation of the k vectors in \Phi. But every permutation in S_k has an order that divides k!. That is, applying \tau k! times must send every vector in \Phi back to itself. But since \Phi is a spanning set for V, this means that \tau^{k!}=1, or that \tau^{k!}-1=0

So we have two polynomial relations satisfied by \tau, and \tau will clearly satisfy any linear combination of these relations. But Euclid’s algorithm shows us that we can write the greatest common divisor of these relations as a linear combination, and so \tau must satisfy the greatest common divisor of T^{k!}-1 and (T-1)^n. It’s not hard to show that this greatest common divisor is T-1, which means that we must have \tau-1=0 or \tau=1.

It’s sort of convoluted, but there are some neat tricks along the way, and we’ll be able to put this result to good use soon.

January 19, 2010 Posted by | Algebra, Geometry, Linear Algebra | 2 Comments

Reflections

Before introducing my main question for the next series of posts, I’d like to talk a bit about reflections in a real vector space V equipped with an inner product \langle\underline{\hphantom{X}},\underline{\hphantom{X}}\rangle. If you want a specific example you can think of the space \mathbb{R}^n consisting of n-tuples of real numbers v=(v^1,\dots,v^n). Remember that we’re writing our indices as superscripts, so we shouldn’t think of these as powers of some number v, but as the components of a vector. For the inner product, \langle u,v\rangle you can think of the regular “dot product” \langle u,v\rangle=u^1v^1+\dots+u^nv^n.

Everybody with me? Good. Now that we’ve got our playing field down, we need to define a reflection. This will be an orthogonal transformation, which is just a fancy way of saying “preserves lengths and angles”. What makes it a reflection is that there’s some n-1-dimensional “hyperplane” P that acts like a mirror. Every vector in P itself is just left where it is, and a vector on the line that points perpendicularly to P will be sent to its negative — “reflecting” through the “mirror” of P.

Any nonzero vector \alpha spans a line \mathbb{R}\alpha, and the orthogonal complement — all the vectors perpendicular to \alpha — forms an n-1-dimensional subspace P_\alpha, which we can use to make just such a reflection. We’ll write \sigma_\alpha for the reflection determined in this way by \alpha. We can easily write down a formula for this reflection:

\displaystyle\sigma_\alpha(\beta)=\beta-\frac{2\langle\beta,\alpha\rangle}{\langle\alpha,\alpha\rangle}\alpha

It’s easy to check that if \beta=c\alpha then \sigma_\alpha(\beta)=-\beta, while if \beta is perpendicular to \alpha — if \langle\beta,\alpha\rangle=0 — then \sigma_\alpha(\beta)=\beta, leaving the vector fixed. Thus this formula does satisfy the definition of a reflection through P_\alpha.

The amount that reflection moves \beta in the above formula will come up a lot in the near future; enough so we’ll want to give it the notation \beta\rtimes\alpha. That is, we define:

\displaystyle\beta\rtimes\alpha=\frac{2\langle\beta,\alpha\rangle}{\langle\alpha,\alpha\rangle}

Notice that this is only linear in \beta, not in \alpha. You might also notice that this is exactly twice the length of the projection of the vector \beta onto the vector \alpha. This notation isn’t standard, but the more common notation conflicts with other notational choices we’ve made on this weblog, so I’ve made an executive decision to try it this way.

January 18, 2010 Posted by | Algebra, Geometry, Linear Algebra | 6 Comments

Cramer’s Rule

We’re trying to invert a function f:X\rightarrow\mathbb{R}^n which is continuously differentiable on some region X\subseteq\mathbb{R}^n. That is we know that if a is a point where J_f(a)\neq0, then there is a ball N around a where f is one-to-one onto some neighborhood f(N) around f(a). Then if y is a point in f(N), we’ve got a system of equations

\displaystyle f^j(x^1,\dots,x^n)=y^j

that we want to solve for all the x^i.

We know how to handle this if f is defined by a linear transformation, represented by a matrix A=\left(a_i^j\right):

\displaystyle\begin{aligned}f^j(x^1,\dots,x^n)=a_i^jx^i&=y^j\\Ax&=y\end{aligned}

In this case, the Jacobian transformation is just the function f itself, and so the Jacobian determinant \det\left(a_i^j\right) is nonzero if and only if the matrix A is invertible. And so our solution depends on finding the inverse A^{-1} and solving

\displaystyle\begin{aligned}Ax&=y\\A^{-1}Ax&=A^{-1}y\\x&=A^{-1}y\end{aligned}

This is the approach we’d like to generalize. But to do so, we need a more specific method of finding the inverse.

This is where Cramer’s rule comes in, and it starts by analyzing the way we calculate the determinant of a matrix A. This formula

\displaystyle\sum\limits_{\pi\in S_n}\mathrm{sgn}(\pi)a_1^{\pi(1)}\dots a_n^{\pi(n)}

involves a sum over all the permutations \pi\in S_n, and we want to consider the order in which we add up these terms. If we fix an index i, we can factor out each matrix entry in the ith column:

\displaystyle\sum\limits_{j=1}^na_i^j\sum\limits_{\substack{\pi\in S_n\\\pi(i)=j}}\mathrm{sgn}(\pi)a_1^{\pi(1)}\dots\widehat{a_i^j}\dots a_n^{\pi(n)}

where the hat indicates that we omit the ith term in the product. For a given value of j, we can consider the restricted sum

\displaystyle A_j^i=\sum\limits_{\substack{\pi\in S_n\\\pi(i)=j}}\mathrm{sgn}(\pi)a_1^{\pi(1)}\dots\widehat{a_i^j}\dots a_n^{\pi(n)}

which is (-1)^{i+j} times the determinant of the ij “minor” of the matrix A. That is, if we strike out the row and column of A which contain a_i^j and take the determinant of the remaining (n-1)\times(n-1) matrix, we multiply this by (-1)^{i+j} to get A_j^i. These are the entries in the “adjugate” matrix \mathrm{adj}(A).

What we’ve shown is that

\displaystyle A_j^ia_i^j=\det(A)

(no summation on i). It’s not hard to show, however, that if we use a different row from the adjugate matrix we find

\displaystyle\sum\limits_{j=1}^nA_j^ka_i^j=\det(A)\delta_i^k

That is, the adjugate times the original matrix is the determinant of A times the identity matrix. And so if \det(A)\neq0 we find

\displaystyle A^{-1}=\frac{1}{\det(A)}\mathrm{adj}(A)

So what does this mean for our system of equations? We can write

\displaystyle\begin{aligned}x&=\frac{1}{\det(A)}\mathrm{adj}(A)y\\x^i&=\frac{1}{\det(A)}A_j^iy^j\end{aligned}

But how does this sum A_j^iy^j differ from the one A_j^ia_i^j we used before (without summing on i) to calculate the determinant of A? We’ve replaced the ith column of A by the column vector y, and so this is just another determinant, taken after performing this replacement!

Here’s an example. Let’s say we’ve got a system written in matrix form

\displaystyle\begin{pmatrix}a&b\\c&d\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}=\begin{pmatrix}u\\v\end{pmatrix}

The entry in the ith row and jth column of the adjugate matrix is calculated by striking out the ith column and jth row of our original matrix, taking the determinant of the remaining matrix, and multiplying by (-1)^{i+j}. We get

\displaystyle\begin{pmatrix}d&-b\\-c&a\end{pmatrix}

and thus we find

\displaystyle\begin{pmatrix}x\\y\end{pmatrix}=\frac{1}{ad-bc}\begin{pmatrix}d&-b\\-c&a\end{pmatrix}\begin{pmatrix}u\\v\end{pmatrix}=\frac{1}{ad-bc}\begin{pmatrix}ud-bv\\av-uc\end{pmatrix}

where we note that

\displaystyle\begin{aligned}ud-bv&=\det\begin{pmatrix}u&b\\v&d\end{pmatrix}\\av-uc&=\det\begin{pmatrix}a&u\\c&v\end{pmatrix}\end{aligned}

In other words, our solution is given by ratios of determinants:

\displaystyle\begin{aligned}x&=\frac{\det\begin{pmatrix}u&b\\v&d\end{pmatrix}}{\det\begin{pmatrix}a&b\\c&d\end{pmatrix}}\\y&=\frac{\det\begin{pmatrix}a&u\\c&v\end{pmatrix}}{\det\begin{pmatrix}a&b\\c&d\end{pmatrix}}\end{aligned}

and similar formulae hold for larger systems of equations.

November 17, 2009 Posted by | Algebra, Linear Algebra | 8 Comments

The Hodge Star

Sorry for the delay from last Friday to today, but I was chasing down a good lead.

Anyway, last week I said that I’d talk about a linear map that extends the notion of the correspondence between parallelograms in space and perpendicular vectors.

First of all, we should see why there may be such a correspondence. We’ve identified k-dimensional parallelepipeds in an n-dimensional vector space V with antisymmetric tensors of degree k: A^k(V). Of course, not every such tensor will correspond to a parallelepiped (some will be linear combinations that can’t be written as a single wedge of k vectors), but we’ll just keep going and let our methods apply to such more general tensors. Anyhow, we also know how to count the dimension of the space of such tensors:

\displaystyle\dim\left(A^k(V)\right)=\binom{n}{k}=\frac{n!}{k!(n-k)!}

This formula tells us that A^k(V) and A^{n-k}(V) will have the exact same dimension, and so it makes sense that there might be an isomorphism between them. And we’re going to look for one which defines the “perpendicular” n-k-dimensional parallelepiped with the same size.

So what do we mean by “perpendicular”? It’s not just in terms of the “angle” defined by the inner product. Indeed, in that sense the parallelograms e_1\wedge e_2 and e_1\wedge e_3 are perpendicular. No, we want any vector in the subspace defined by our parallelepiped to be perpendicular to any vector in the subspace defined by the new one. That is, we want the new parallelepiped to span the orthogonal complement to the subspace we start with.

Our definition will also need to take into account the orientation on V. Indeed, considering the parallelogram e_1\wedge e_2 in three-dimensional space, the perpendicular must be ce_3 for some nonzero constant c, or otherwise it won’t be perpendicular to the whole xy plane. And \vert c\vert has to be {1} in order to get the right size. But will it be +e_3 or -e_3? The difference is entirely in the orientation.

Okay, so let’s pick an orientation on V, which gives us a particular top-degree tensor \omega so that \mathrm{vol}(\omega)=1. Now, given some \eta\in A^k(V), we define the Hodge dual *\eta\in A^{n-k}(V) to be the unique antisymmetric tensor of degree n-k satisfying

\displaystyle\zeta\wedge*\eta=\langle\zeta,\eta\rangle\omega

for all \zeta\in A^k(V). Notice here that if \eta and \zeta describe parallelepipeds, and any side of \zeta is perpendicular to all the sides of \eta, then the projection of \zeta onto the subspace spanned by \eta will have zero volume, and thus \langle\zeta,\eta\rangle=0. This is what we expect, for then this side of \zeta must lie within the perpendicular subspace spanned by *\eta, and so the wedge \zeta\wedge*\eta should also be zero.

As a particular example, say we have an orthonormal basis \{e_i\}_{i=1}^n of V so that \omega=e_1\wedge\dots\wedge e_n. Then given a multi-index I=(i_1,\dots,i_k) the basic wedge e_I gives us the subspace spanned by the vectors \{e_{i_1},\dots,e_{i_k}\}. The orthogonal complement is clearly spanned by the remaining basis vectors \{e_{j_1},\dots,e_{j_{n-k}}\}, and so *e_I=\pm e_J, with the sign depending on whether the list (i_1,\dots,i_k,j_1,\dots,j_{n-k}) is an even or an odd permutation of (1,\dots,n).

To be even more explicit, let’s work these out for the cases of dimensions three and four. First off, we have a basis \{e_1,e_2,e_3\}. We work out all the duals of basic wedges as follows:

\displaystyle\begin{aligned}*1&=e_1\wedge e_2\wedge e_3\\ *e_1&=e_2\wedge e_3\\ *e_2&=-e_1\wedge e_3=e_3\wedge e_1\\ *e_3&=e_1\wedge e_2\\ *(e_1\wedge e_2)&=e_3\\ *(e_1\wedge e_3)&=-e_2\\ *(e_2\wedge e_3)&=e_1\\ *(e_1\wedge e_2\wedge e_3)&=1\end{aligned}

This reconstructs the correspondence we had last week between basic parallelograms and perpendicular basis vectors. In the four-dimensional case, the basis \{e_1,e_2,e_3,e_4\} leads to the duals

\displaystyle\begin{aligned}*1&=e_1\wedge e_2\wedge e_3\wedge e_4\\ *e_1&=e_2\wedge e_3\wedge e_4\\ *e_2&=-e_1\wedge e_3\wedge e_4\\ *e_3&=e_1\wedge e_2\wedge e_4\\\ *e_4&=-e_1\wedge e_2\wedge e_3\\ *(e_1\wedge e_2)&=e_3\wedge e_4\\ *(e_1\wedge e_3)&=-e_2\wedge e_4\\ *(e_1\wedge e_4)&=e_2\wedge e_3\\ *(e_2\wedge e_3)&=e_1\wedge e_4\\ *(e_2\wedge e_4)&=-e_1\wedge e_3\\ *(e_3\wedge e_4)&=e_1\wedge e_2\\ *(e_1\wedge e_2\wedge e_3)&=e_4\\ *(e_1\wedge e_2\wedge e_4)&=-e_3\\ *(e_1\wedge e_3\wedge e_4)&=e_2\\ *(e_2\wedge e_3\wedge e_4)&=-e_1\\ *(e_1\wedge e_2\wedge e_3\wedge e_4)&=1\end{aligned}

It’s not a difficult exercise to work out the relation **\eta=(-1)^{k(n-k)}\eta for a degree k tensor in an n-dimensional space.

November 9, 2009 Posted by | Algebra, Analytic Geometry, Geometry, Linear Algebra | 6 Comments