The Unapologetic Mathematician

Spanning sets

Today I just want to point out a dual proposition to the one I refined last week. At that time we stated that any linearly independent set can be expanded to a basis. This followed from the fact that a basis is a maximal linearly independent set. But a basis is also a minimal spanning set, and that will lead to our flipped result.

First, a basis must be a spanning set, and if we remove any vector it can no longer be a spanning set. For if the remaining vectors spanned, then the removed vector could be written as a finite linear combination of them, and this would contradict the linear independence of the basis.

On the other hand, if a spanning set $\left\{e_i\right\}$ is minimal it must be linearly independent. For if it were not, we could write out a nontrivial finite linear combination

$0=\sum r^ie_i$

For some index — say $i_0$ — we have $r^{i_0}\neq0$. Then we can rearrange this to write

$e_{i_0}=\frac{1}{r^{i_0}}\sum\limits_{i\neq i_0}r^ie_i$

and so we may omit the vector $e_{i_0}$ from the set and the remaining vectors still span. This contradicts the assumed minimality of the original spanning set.

So a basis is a minimal spanning set. This leads us to the proposition that any spanning set may be narrowed to a basis. And the proof is again the same technique we used to show that every vector space has a basis. It’s just that this time we flip the whole thing over. Now we consider the set of subsets of our spanning set which span the vector space, and we use Zorn’s lemma to show that this must contain a minimal spanning set. This will then be a basis contained in our original spanning set.

Of course, as usual, in the finite-dimensional case we don’t need Zorn’s lemma, so the squeamish can relax in that case.

June 30, 2008 Posted by | Algebra, Linear Algebra | 5 Comments

The Rank-Nullity Theorem

Today we start considering how a given linear transformation acts on a vector space. And we start with the “rank-nullity” theorem. This sounds really fancy, but it’s actually almost trivial.

We already said that $\mathbf{Vect}(\mathbb{F})$ (also $\mathbf{FinVect}(\mathbb{F})$) is a abelian category. Now in any abelian category we have the first isomorphism theorem.

So let’s take this and consider a linear transformation $T:V\rightarrow W$. The first isomorphism theorem says we can factor $T$ as a surjection $E$ followed by an injection $M$. We’ll just regard the latter as the inclusion of the image of $T$ as a subspace of $W$. As for the surjection, it must be the linear map $V\rightarrow V/\mathrm{Ker}(T)$, just as in any abelian category. Then we can set up the short exact sequence

$\mathbf{0}\rightarrow\mathrm{Ker}(T)\rightarrow V\rightarrow V/\mathrm{Ker}(T)\rightarrow\mathbf{0}$

and the isomorphism theorem allows us to replace the last term

$\mathbf{0}\rightarrow\mathrm{Ker}(T)\rightarrow V\rightarrow \mathrm{Im}(T)\rightarrow\mathbf{0}$

Now since every short exact sequence splits we have an isomorphism $V\cong\mathrm{Ker}(T)\oplus\mathrm{Im}(T)$. This is the content of the rank-nullity theorem.

So where do “rank” and “nullity” come in? Well, these are just jargon terms. The “rank” of a linear transformation is the dimension of its image — not the target vector space, mind you, but the subspace of vectors of the form $T(v)$. That is, it’s the size of the largest set of linearly independent vectors in the image of the transformation. The “nullity” is the dimension of the kernel — the largest number of linearly independent vectors that $T$ sends to the zero vector in $W$.

So what does the direct sum decomposition above mean? It tells us that there is a basis of $V$ which is in bijection with the disjoint union of a basis for $\mathrm{Ker}(T)$ and a basis for $\mathrm{Im}(T)$. In the finite-dimensional case we can take cardinalities and say that the dimension of $V$ is the sum of the dimensions of $\mathrm{Ker}(T)$ and $\mathrm{Im}(T)$. Or, to use our new words, the dimension of $V$ is the sum of the rank and the nullity of $T$. Thus: the rank-nullity theorem.

June 27, 2008 Posted by | Algebra, Linear Algebra | 13 Comments

Exact sequences split

Now that we know the splitting lemma, we can show that every short exact sequence of vector spaces splits!

To see this, we’ll need to refine an earlier result. Remember how we showed that every vector space has a basis. We looked for maximal linearly independent sets and used Zorn’s lemma to assert that they existed.

Here’s how we’re going to refine this result: start with a collection $S$ of linearly independent vectors. Then we don’t just look for any maximal collection, but specifically for a maximal collection containing $S$. Clearly if we have a linearly independent set $B$ containing $S$ which is maximal among such sets, it is also maximal among all linearly independent sets — it is a basis. On the other hand, the previous argument (with Zorn’s lemma) says that such a maximal linearly independent set must exist.

What does this mean? It says that any linearly independent set can be completed to a basis. If we start with the empty set (which is trivially linearly independent) then we get a basis, just like before. So we recover the same old result as before.

But look what we can do now! Take a short exact sequence

$\mathbf{0}\rightarrow U\xrightarrow{S}V\xrightarrow{T}W\rightarrow\mathbf{0}$

and pick any basis $\left\{e_i\right\}_{i\in\mathcal{I}}$ of $U$ (notice that we’re using a generic, possibly infinite, index set). Now hit this basis with $S$ to get $\left\{f_i\right\}_{i\in\mathcal{I}}$ with $f_i=S(e_i)$. I say that this is a linearly independent set in $V$.

Why is this? Well, let’s say that there’s a linear combination $r^if_i=0$ (only finitely many of the $r^i$ can be nonzero here). This linear combination is in the image of $S$, since we can write it as $r^iS(e_i)=S(r^ie_i)$. But exactness tells us that $S$ is injective, and so we have $r^ie_i=0$. But then all the $r^i$ have to vanish, since the $e_i$ form a basis!

So we’ve got a linearly independent set $\left\{f_i\right\}_{i\in\mathcal{I}}$ in $V$. We now complete this to a maximal linearly independent set $\left\{f_i\right\}_{i\in\mathcal{I}'}$ with $\mathcal{I}\subseteq\mathcal{I}'$. This is now a basis for $V$, which contains the image of all the basis elements from $U$.

Now turn this around and define a linear transformation $S':V\rightarrow U$ by specifying its values on this basis. Set $S'(f_i)=e_i$ for $i\in\mathcal{I}$ and $S'(f_i)=0$ for $i\in\mathcal{I}'\setminus\mathcal{I}$. Then the composition $S'\circ S$ is the identity on $U$ (check it on our basis), and so the sequence splits, as we said.

Notice here that all the Zorniness only matters for infinite-dimensional vector spaces. Everything we’ve done here works in $\mathbf{FinVec}(\mathbb{F})$ without ever having to worry about such set-theoretic problems. However, given my politics I have no problem with using the Axiom of Choice when push comes to shove. I’m just pointing this out in case you’re the squeamish sort.

June 26, 2008 Posted by | Algebra, Linear Algebra | 11 Comments

The Splitting Lemma

Evidently I never did this one when I was talking about abelian categories. Looks like I have to go back and patch this now.

$\mathbf{0}\rightarrow A\xrightarrow{f}B\xrightarrow{g}C\rightarrow\mathbf{0}$

A large class of examples of such sequences are provided by the split-exact sequences:

$\mathbf{0}\rightarrow A\rightarrow A\oplus C\rightarrow C\rightarrow\mathbf{0}$

where these arrows are those from the definition of the biproduct. But in this case we’ve also got other arrows: $A\oplus C\rightarrow A$ and $C\rightarrow A\oplus C$ that satisfy certain relations.

The lemma says that we can go the other direction too. If we have one arrow $h:B\rightarrow A$ so that $h\circ f=1_A$ then everything else falls into place, and $B\cong A\oplus C$. Similarly, a single arrow $h:C\rightarrow B$ so that $g\circ h=1_C$ will “split” the sequence. We’ll just prove the first one, since the second goes more or less the same way.

Just like with diagram chases, we’re going to talk about “elements” of objects as if the objects are abelian groups. Of course, we don’t really mean “elements”, but the exact same semantic switch works here.

So let’s consider an element $b\in B$ and write it as $(b-f(h(b)))+f(h(b))$. Clearly $f(h(b))$ lands in $\mathrm{Im}(f)$. We can also check

$h(b-f(h(b)))=h(b)-h(f(h(b)))=h(b)-h(b)=0$

so $b-f(h(b))\in\mathrm{Ker}(h)$. That is, any element of $B$ can be written as the sum of an element of $\mathrm{Im}(f)$ and an element of $\mathrm{Ker}(h)$. But these two intersect trivially. That is, if $b=f(a)$ and $h(b)=0$ then $0=h(f(a))=a$, and so $b=0$. This shows that $B\cong\mathrm{Ker}(h)\oplus\mathrm{Im}(f)$. Thus we can write every $b$ uniquely as $b=f(a)+k$.

Now consider an element $c\in C$. By exactness, there must be some $b\in B$ so that $c=g(b)=g(f(a)+k)=g(f(a))+g(k)$. That is, we have a unique $k\in\mathrm{Ker}(h)$ with $g(k)=c$. This shows that $C\cong\mathrm{Ker}(h)$. It’s straightforward to show that also $A\cong\mathrm{Im}(f)$. Thus we have split the sequence: $B\cong A\oplus C$.

June 25, 2008 Posted by | Category theory | 6 Comments

The Category of Matrices IV

Finally, all the pieces are in place to state the theorem I’ve been driving at for a while now:

The functor that we described from $\mathbf{Mat}(\mathbb{F})$ to $\mathbf{FinVec}(\mathbb{F})$ is an equivalence.

To show this, we must show that it is full, faithful, and essentially surjective. The first two conditions say that given natural numbers $m$ and $n$ the linear transformations $\mathbb{F}^m\rightarrow\mathbb{F}^n$ and the $n\times m$ matrices over $\mathbb{F}$ are in bijection.

But this is just what we’ve been showing! The vector spaces of $n$-tuples come with their canonical bases, and given these bases every linear transformation gets a uniquely-defined matrix. Conversely, every matrix defines a unique linear transformation when we’ve got the bases to work with. So fullness and faithfulness are straightforward.

Now for essential surjectivity. This says that given any finite-dimensional vector space $V$ we have some $n$ so that $V\cong\mathbb{F}^n$. But we know that every vector space has a basis, and for $V$ it must be finite; that’s what “finite-dimensional” means! Let’s say that we’ve got a basis $\left\{f_i\right\}$ consisting of $n$ vectors.

Now we just line up the canonical basis $\left\{e_i\right\}$ of $\mathbb{F}^n$ and define linear transformations by $S(e_i)=f_i$ and $T(f_i)=e_i$. Remember that we can define a linear transformation by specifying its values on a basis (which can all be picked independently) and then extending by linearity. Thus we do have two well-defined linear transformations here. But just as clearly we see that for any $v\in V$ we have

$S(T(v))=S(T(v^ie_i))=v^iS(T(e_i))=v^iS(f_i)=v^ie_i=v$

and a similar equation holds for every $n$-tuple in $\mathbb{F}^n$. Thus $S$ and $T$ are inverses of each other, and are the isomorphism we need.

This tells us that the language of linear transformations between finite-dimensional vector spaces is entirely equivalent to that of matrices. But we gain some conceptual advantages by thinking in terms of finite-dimensional vector spaces. One I can point to right here is how we can tell the difference between a vector space and its dual. Sure, they’ve got the same dimension, and so there’s some isomorphism between them. Still, when we’re dealing with both at the same time they behave differently, and it’s valuable to keep our eye on that difference.

On the other hand, there are benefits to matrices. For one thing, we can actually write them down and calculate with them. A lot of people are — surprise! — interested in using mathematics to solve problems. And the problems that linear algebra is most directly applicable to are naturally stated in terms of matrices.

What the theorem tells us is that none of this matters. We can translate problems from the category of matrices to the category of vector spaces and back, and nothing is lost in the process.

June 24, 2008

The Category of Matrices III

At long last, let’s get back to linear algebra. We’d laid out the category of matrices $\mathbf{Mat}(\mathbb{F})$, and we showed that it’s a monoidal category with duals.

Now here’s the really important thing: There’s a functor $\mathbf{Mat}(\mathbb{F})\rightarrow\mathbf{FinVec}(\mathbb{F})$ that assigns the finite-dimensional vector space $\mathbb{F}^n$ of $n$-tuples of elements of $\mathbb{F}$ to each object $n$ of $\mathbf{Mat}(\mathbb{F})$. Such a vector space of $n$-tuples comes with the basis $\left\{e_i\right\}$, where the vector $e_i$ has a ${1}$ in the $i$th place and a ${0}$ elsewhere. In matrix notation:

$\displaystyle e_1=\begin{pmatrix}1\\{0}\\\vdots\\{0}\end{pmatrix}$
$\displaystyle e_2=\begin{pmatrix}{0}\\1\\\vdots\\{0}\end{pmatrix}$

and so on. We can write $e_i=\delta_i^je_j$ (remember the summation convention), so the vector components of the basis vectors are given by the Kronecker delta. We will think of other vectors as column vectors.

Given a matrix $\left(t_i^j\right)\in\hom(m,n)$ we clearly see a linear transformation from $\mathbb{F}^m$ to $\mathbb{F}^n$. Given a column vector with components $v^i$ (where the index satisfies $1\leq i\leq m$), we construct the column vector $t_i^jv^i$ (here $1\leq j\leq n$). But we’ve already established that matrix multiplication represents composition of linear transformations. Further, it’s straightforward to see that the linear transformation corresponding to a matrix $\left(\delta_i^j\right)$ is the identity on $\mathbb{F}^n$ (depending on the range of the indices on the Kronecker delta). This establishes that we really have defined a functor.

But wait, there’s more! The functor is linear over $\mathbb{F}$, so it’s a functor enriched over $\mathbb{F}$. The Kronecker product of matrices corresponds to the monoidal product of linear transformations, so the functor is monoidal, too. Following the definitions, we can even find that our functor preserves duals.

So we’ve got a functor from our category of matrices to the category of finite-dimensional vector spaces, and it preserves all of the relevant structure.

June 23, 2008

Screw you, George

George Carlin is dead at 71. Since any appropriate words would be inappropriate here, I’ll just say “good riddance”. I think he would have, too.

June 23, 2008 Posted by | Uncategorized | 2 Comments

The Category of Matrices II

As we consider the category $\mathbf{Mat}(\mathbb{F})$ of matrices over the field $\mathbb{F}$, we find a monoidal structure.

We define the monoidal product $\boxtimes$ on objects by multiplication — $m\boxtimes n=mn$ — and on morphisms by using the Kronecker product. That is, if we have an $m_1\times n_1$ matrix $\left(s_{i_1}^{j_1}\right)\in\hom(n_1,m_1)$ and an $m_2\times n_2$ matrix $\left(t_{i_2}^{j_2}\right)\in\hom(n_2,m_2)$, then we get the Kronecker product

$\left(s_{i_1}^{j_1}\right)\boxtimes\left(t_{i_2}^{j_2}\right)=\left(s_{i_1}^{j_1}t_{i_2}^{j_2}\right)$

Here we have to be careful about what we’re saying. In accordance with our convention, the pair of indices $(i_1,i_2)$ (with $1\leq i_1\leq m_1$ and $1\leq i_2\leq m_2$) should be considered as the single index $(i_1-1)m_2+i_2$. It’s clear that this quantity then runs between ${1}$ and $m_1m_2$. A similar interpretation goes for the index pairs $(j_1,j_2)$.

Of course, we need some relations for this to be a monoidal structure. Strict associativity is straightforward:

$\left(\left(r_{i_1}^{j_1}\right)\boxtimes\left(s_{i_2}^{j_2}\right)\right)\boxtimes\left(t_{i_3}^{j_3}\right)=\left((r_{i_1}^{j_1}s_{i_2}^{j_2})t_{i_3}^{j_3}\right)=\left(r_{i_1}^{j_1}(s_{i_2}^{j_2}t_{i_3}^{j_3})\right)=\left(r_{i_1}^{j_1}\right)\boxtimes\left(\left(s_{i_2}^{j_2}\right)\boxtimes\left(t_{i_3}^{j_3}\right)\right)$

For our identity object, we naturally use ${1}$, with its identity morphism $\left(1\right)$. Note that the first of these is the object the natural number ${1}$, while the second is the $1\times1$ matrix whose single entry is the field element ${1}$. Then we can calculate the Kronecker product to find

$\left(t_i^j\right)\boxtimes\left(1\right)=\left(t_i^j\right)=\left(1\right)\boxtimes\left(t_i^j\right)$

and so strict associativity holds as well.

The category of matrices also has duals. In fact, each object is self-dual! That is, we set $n^*=n$. We then need our arrows $\eta_n:1\rightarrow n\boxtimes n$ and $\epsilon_n:n\boxtimes n\rightarrow1$.

The morphism $\eta_n$ will be a $1\times n^2$ matrix. Specifically, we’ll use $\eta_n=\left(\delta^{i,j}\right)$, with $i$ and $j$ both running between ${1}$ and $n$. Again, we interpret an index pair as described above. The symbol $\delta^{i,j}$ is another form of the Kronecker delta, which takes the value ${1}$ when its indices agree and ${0}$ when they don’t.

Similarly, $\epsilon_n$ will be an $n^2\times1$ matrix: $\epsilon_n=\left(\delta_{i,j}\right)$, using yet another form of the Kronecker delta.

Now we have compatibility relations. Since the monoidal structure is strict, these are simpler than usual:

$(\epsilon_n\otimes1_n)\circ(1_n\otimes\eta_n)=1_n$
$(1_{n^*}\otimes\epsilon_n)\circ(\eta_n\otimes1_{n^*})=1_{n^*}$

But now all the basic matrices in sight are various Kronecker deltas! The first equation reads

$\left(\delta_a^b\delta^{c,d}\right)\left(\delta_{b,c}\delta_d^e\right)=\delta_a^e$

which is true. You should be able to verify the second one similarly.

The upshot is that we’ve got the structure of a monoidal category with duals on $\mathbf{Mat}(\mathbb{F})$.

June 3, 2008

The Category of Matrices I

What we’ve been building up to is actually the definition of a category. Given a field $\mathbb{F}$ we define the category $\mathbf{Mat}(\mathbb{F})$ of matrices over $\mathbb{F}$.

Most of our other categories have been named after their objects — groups are the objects of $\mathbf{Grp}$, commutative monoids are the objects of $\mathbf{CMon}$, and so on — but not here. In this case, matrices will be the morphisms, and the category of matrices illustrates in a clearer way than any we’ve seen yet how similar categories are to other algebraic structures that are usually seen as simpler and more concrete.

Down to business: the objects of $\mathbf{Mat}(\mathbb{F})$ will be the natural numbers $\mathbb{N}$, and the morphisms in $\hom(m,n)$ are the $n\times m$ matrices. That is, a morphism is a collection of field elements $\left(t_i^j\right)$ where $i$ runs from ${1}$ to $m$ and $j$ runs from ${1}$ to $n$.

We compose two morphisms by the process of matrix multiplication. If $\left(s_i^j\right)$ is an $n\times m$ matrix in $\hom(m,n)$ and $\left(t_j^k\right)$ is a $p\times n$ matrix in $\hom(n,p)$, then their product $\left(s_i^jt_j^k\right)$ is a $p\times m$ matrix in $\hom(m,p)$ (remember the summation convention).

The category of matrices is actually enriched over the category of vector spaces over $\mathbb{F}$. This means that each set of morphisms is actually a vector space over $\mathbb{F}$. Specifically, we add matrices of the same dimensions and multiply matrices by scalars component-by-component.

We have yet to speak very clearly about identities. The axioms of an enriched category state that for each object (natural number) $n$ there must be a linear function $I_n:\mathbb{F}\rightarrow\hom(n,n)$. Because of linearity, this function is completely determined by its value at $1\in\mathbb{F}$: $I_n(x)=xI_n(1)$. We must pick this matrix $I_n(1)$ so that it acts as an identity for matrix multiplication, and we choose the Kronecker delta for this purpose: $I_n(1)=\left(\delta_i^j\right)$. That is, we use an $n\times n$ matrix whose entries are ${1}$ if the indices are equal and ${0}$ otherwise. It’s straightforward to check that this is indeed an identity.

Other properties I’ve skipped over, but which aren’t hard to check, are that matrix multiplication is bilinear and associative. Both of these are straightforward once written out in terms of the summation convention; sometimes deceptively so. For example, the associativity condition reads $(r_i^js_j^k)t_k^l=r_i^j(s_j^kt_k^l)$. But remember that there are hidden summation signs in here, so it should really read:

$\displaystyle\sum\limits_k\left(\sum\limits_jr_i^js_j^k\right)t_k^l=\sum\limits_jr_i^j\left(\sum\limits_ks_j^kt_k^l\right)$

so there’s an implicit change in the order of summation here. Since we’re just doing finite sums, this is no problem, but it’s still worth keeping an eye on.

June 2, 2008