Spanning sets
Today I just want to point out a dual proposition to the one I refined last week. At that time we stated that any linearly independent set can be expanded to a basis. This followed from the fact that a basis is a maximal linearly independent set. But a basis is also a minimal spanning set, and that will lead to our flipped result.
First, a basis must be a spanning set, and if we remove any vector it can no longer be a spanning set. For if the remaining vectors spanned, then the removed vector could be written as a finite linear combination of them, and this would contradict the linear independence of the basis.
On the other hand, if a spanning set is minimal it must be linearly independent. For if it were not, we could write out a nontrivial finite linear combination
For some index — say — we have
. Then we can rearrange this to write
and so we may omit the vector from the set and the remaining vectors still span. This contradicts the assumed minimality of the original spanning set.
So a basis is a minimal spanning set. This leads us to the proposition that any spanning set may be narrowed to a basis. And the proof is again the same technique we used to show that every vector space has a basis. It’s just that this time we flip the whole thing over. Now we consider the set of subsets of our spanning set which span the vector space, and we use Zorn’s lemma to show that this must contain a minimal spanning set. This will then be a basis contained in our original spanning set.
Of course, as usual, in the finite-dimensional case we don’t need Zorn’s lemma, so the squeamish can relax in that case.
The Rank-Nullity Theorem
Today we start considering how a given linear transformation acts on a vector space. And we start with the “rank-nullity” theorem. This sounds really fancy, but it’s actually almost trivial.
We already said that (also
) is a abelian category. Now in any abelian category we have the first isomorphism theorem.
So let’s take this and consider a linear transformation . The first isomorphism theorem says we can factor
as a surjection
followed by an injection
. We’ll just regard the latter as the inclusion of the image of
as a subspace of
. As for the surjection, it must be the linear map
, just as in any abelian category. Then we can set up the short exact sequence
and the isomorphism theorem allows us to replace the last term
Now since every short exact sequence splits we have an isomorphism . This is the content of the rank-nullity theorem.
So where do “rank” and “nullity” come in? Well, these are just jargon terms. The “rank” of a linear transformation is the dimension of its image — not the target vector space, mind you, but the subspace of vectors of the form . That is, it’s the size of the largest set of linearly independent vectors in the image of the transformation. The “nullity” is the dimension of the kernel — the largest number of linearly independent vectors that
sends to the zero vector in
.
So what does the direct sum decomposition above mean? It tells us that there is a basis of which is in bijection with the disjoint union of a basis for
and a basis for
. In the finite-dimensional case we can take cardinalities and say that the dimension of
is the sum of the dimensions of
and
. Or, to use our new words, the dimension of
is the sum of the rank and the nullity of
. Thus: the rank-nullity theorem.
Exact sequences split
Now that we know the splitting lemma, we can show that every short exact sequence of vector spaces splits!
To see this, we’ll need to refine an earlier result. Remember how we showed that every vector space has a basis. We looked for maximal linearly independent sets and used Zorn’s lemma to assert that they existed.
Here’s how we’re going to refine this result: start with a collection of linearly independent vectors. Then we don’t just look for any maximal collection, but specifically for a maximal collection containing
. Clearly if we have a linearly independent set
containing
which is maximal among such sets, it is also maximal among all linearly independent sets — it is a basis. On the other hand, the previous argument (with Zorn’s lemma) says that such a maximal linearly independent set must exist.
What does this mean? It says that any linearly independent set can be completed to a basis. If we start with the empty set (which is trivially linearly independent) then we get a basis, just like before. So we recover the same old result as before.
But look what we can do now! Take a short exact sequence
and pick any basis of
(notice that we’re using a generic, possibly infinite, index set). Now hit this basis with
to get
with
. I say that this is a linearly independent set in
.
Why is this? Well, let’s say that there’s a linear combination (only finitely many of the
can be nonzero here). This linear combination is in the image of
, since we can write it as
. But exactness tells us that
is injective, and so we have
. But then all the
have to vanish, since the
form a basis!
So we’ve got a linearly independent set in
. We now complete this to a maximal linearly independent set
with
. This is now a basis for
, which contains the image of all the basis elements from
.
Now turn this around and define a linear transformation by specifying its values on this basis. Set
for
and
for
. Then the composition
is the identity on
(check it on our basis), and so the sequence splits, as we said.
Notice here that all the Zorniness only matters for infinite-dimensional vector spaces. Everything we’ve done here works in without ever having to worry about such set-theoretic problems. However, given my politics I have no problem with using the Axiom of Choice when push comes to shove. I’m just pointing this out in case you’re the squeamish sort.
The Splitting Lemma
Evidently I never did this one when I was talking about abelian categories. Looks like I have to go back and patch this now.
We start with a short exact sequence:
A large class of examples of such sequences are provided by the split-exact sequences:
where these arrows are those from the definition of the biproduct. But in this case we’ve also got other arrows: and
that satisfy certain relations.
The lemma says that we can go the other direction too. If we have one arrow so that
then everything else falls into place, and
. Similarly, a single arrow
so that
will “split” the sequence. We’ll just prove the first one, since the second goes more or less the same way.
Just like with diagram chases, we’re going to talk about “elements” of objects as if the objects are abelian groups. Of course, we don’t really mean “elements”, but the exact same semantic switch works here.
So let’s consider an element and write it as
. Clearly
lands in
. We can also check
so . That is, any element of
can be written as the sum of an element of
and an element of
. But these two intersect trivially. That is, if
and
then
, and so
. This shows that
. Thus we can write every
uniquely as
.
Now consider an element . By exactness, there must be some
so that
. That is, we have a unique
with
. This shows that
. It’s straightforward to show that also
. Thus we have split the sequence:
.
The Category of Matrices IV
Finally, all the pieces are in place to state the theorem I’ve been driving at for a while now:
The functor that we described from to
is an equivalence.
To show this, we must show that it is full, faithful, and essentially surjective. The first two conditions say that given natural numbers and
the linear transformations
and the
matrices over
are in bijection.
But this is just what we’ve been showing! The vector spaces of -tuples come with their canonical bases, and given these bases every linear transformation gets a uniquely-defined matrix. Conversely, every matrix defines a unique linear transformation when we’ve got the bases to work with. So fullness and faithfulness are straightforward.
Now for essential surjectivity. This says that given any finite-dimensional vector space we have some
so that
. But we know that every vector space has a basis, and for
it must be finite; that’s what “finite-dimensional” means! Let’s say that we’ve got a basis
consisting of
vectors.
Now we just line up the canonical basis of
and define linear transformations by
and
. Remember that we can define a linear transformation by specifying its values on a basis (which can all be picked independently) and then extending by linearity. Thus we do have two well-defined linear transformations here. But just as clearly we see that for any
we have
and a similar equation holds for every -tuple in
. Thus
and
are inverses of each other, and are the isomorphism we need.
This tells us that the language of linear transformations between finite-dimensional vector spaces is entirely equivalent to that of matrices. But we gain some conceptual advantages by thinking in terms of finite-dimensional vector spaces. One I can point to right here is how we can tell the difference between a vector space and its dual. Sure, they’ve got the same dimension, and so there’s some isomorphism between them. Still, when we’re dealing with both at the same time they behave differently, and it’s valuable to keep our eye on that difference.
On the other hand, there are benefits to matrices. For one thing, we can actually write them down and calculate with them. A lot of people are — surprise! — interested in using mathematics to solve problems. And the problems that linear algebra is most directly applicable to are naturally stated in terms of matrices.
What the theorem tells us is that none of this matters. We can translate problems from the category of matrices to the category of vector spaces and back, and nothing is lost in the process.
The Category of Matrices III
At long last, let’s get back to linear algebra. We’d laid out the category of matrices , and we showed that it’s a monoidal category with duals.
Now here’s the really important thing: There’s a functor that assigns the finite-dimensional vector space
of
-tuples of elements of
to each object
of
. Such a vector space of
-tuples comes with the basis
, where the vector
has a
in the
th place and a
elsewhere. In matrix notation:
and so on. We can write (remember the summation convention), so the vector components of the basis vectors are given by the Kronecker delta. We will think of other vectors as column vectors.
Given a matrix we clearly see a linear transformation from
to
. Given a column vector with components
(where the index satisfies
), we construct the column vector
(here
). But we’ve already established that matrix multiplication represents composition of linear transformations. Further, it’s straightforward to see that the linear transformation corresponding to a matrix
is the identity on
(depending on the range of the indices on the Kronecker delta). This establishes that we really have defined a functor.
But wait, there’s more! The functor is linear over , so it’s a functor enriched over
. The Kronecker product of matrices corresponds to the monoidal product of linear transformations, so the functor is monoidal, too. Following the definitions, we can even find that our functor preserves duals.
So we’ve got a functor from our category of matrices to the category of finite-dimensional vector spaces, and it preserves all of the relevant structure.
Screw you, George
George Carlin is dead at 71. Since any appropriate words would be inappropriate here, I’ll just say “good riddance”. I think he would have, too.
The Category of Matrices II
As we consider the category of matrices over the field
, we find a monoidal structure.
We define the monoidal product on objects by multiplication —
— and on morphisms by using the Kronecker product. That is, if we have an
matrix
and an
matrix
, then we get the Kronecker product
Here we have to be careful about what we’re saying. In accordance with our convention, the pair of indices (with
and
) should be considered as the single index
. It’s clear that this quantity then runs between
and
. A similar interpretation goes for the index pairs
.
Of course, we need some relations for this to be a monoidal structure. Strict associativity is straightforward:
For our identity object, we naturally use , with its identity morphism
. Note that the first of these is the object the natural number
, while the second is the
matrix whose single entry is the field element
. Then we can calculate the Kronecker product to find
and so strict associativity holds as well.
The category of matrices also has duals. In fact, each object is self-dual! That is, we set . We then need our arrows
and
.
The morphism will be a
matrix. Specifically, we’ll use
, with
and
both running between
and
. Again, we interpret an index pair as described above. The symbol
is another form of the Kronecker delta, which takes the value
when its indices agree and
when they don’t.
Similarly, will be an
matrix:
, using yet another form of the Kronecker delta.
Now we have compatibility relations. Since the monoidal structure is strict, these are simpler than usual:
But now all the basic matrices in sight are various Kronecker deltas! The first equation reads
which is true. You should be able to verify the second one similarly.
The upshot is that we’ve got the structure of a monoidal category with duals on .
The Category of Matrices I
What we’ve been building up to is actually the definition of a category. Given a field we define the category
of matrices over
.
Most of our other categories have been named after their objects — groups are the objects of , commutative monoids are the objects of
, and so on — but not here. In this case, matrices will be the morphisms, and the category of matrices illustrates in a clearer way than any we’ve seen yet how similar categories are to other algebraic structures that are usually seen as simpler and more concrete.
Down to business: the objects of will be the natural numbers
, and the morphisms in
are the
matrices. That is, a morphism is a collection of field elements
where
runs from
to
and
runs from
to
.
We compose two morphisms by the process of matrix multiplication. If is an
matrix in
and
is a
matrix in
, then their product
is a
matrix in
(remember the summation convention).
The category of matrices is actually enriched over the category of vector spaces over . This means that each set of morphisms is actually a vector space over
. Specifically, we add matrices of the same dimensions and multiply matrices by scalars component-by-component.
We have yet to speak very clearly about identities. The axioms of an enriched category state that for each object (natural number) there must be a linear function
. Because of linearity, this function is completely determined by its value at
:
. We must pick this matrix
so that it acts as an identity for matrix multiplication, and we choose the Kronecker delta for this purpose:
. That is, we use an
matrix whose entries are
if the indices are equal and
otherwise. It’s straightforward to check that this is indeed an identity.
Other properties I’ve skipped over, but which aren’t hard to check, are that matrix multiplication is bilinear and associative. Both of these are straightforward once written out in terms of the summation convention; sometimes deceptively so. For example, the associativity condition reads . But remember that there are hidden summation signs in here, so it should really read:
so there’s an implicit change in the order of summation here. Since we’re just doing finite sums, this is no problem, but it’s still worth keeping an eye on.