The Unapologetic Mathematician

Mathematics for the interested outsider

Matrix notation

I just spent all day on the road back to NOLA to handle some end-of-month business, clean out my office, and so on. This one will have to do for today and tomorrow.

It gets annoying to write out matrices using the embedded LaTeX here, but I suppose I really should, just for thoroughness’ sake.

In general, a matrix is a collection of field elements with an upper and a lower index. We can write out all these elements in a rectangular array. The upper index should list the rows of our array, while the lower index should list the columns. The matrix \left(t_i^j\right) with entries t_i^j for i running from {1} to m and n running from {1} to n is written out in full as


We call this an n\times m matrix, because the array is n rows high and m columns wide.

There is a natural isomorphism V\cong\hom(\mathbb{F},V). This means that every vector in dimension m, written out in the components relative to a given basis, can be seen as an m\times1 “column vector”:


Similarly, a linear functional on an n-dimensional space can be written as a 1\times n “row vector”:


Notice that evaluation of linear transformations is now just a special case of matrix multiplication! Let’s practice by writing out the composition of a linear functional \mu\in\left(\mathbb{F}^n\right)^*, a linear map T:\mathbb{F}^m\rightarrow\mathbb{F}^n, and a vector v\in\mathbb{F}^m.


A matrix product makes sense if and only if the number of columns in the left-hand matrix is the same as the number of rows in the right-hand matrix. That is, an m\times n and an n\times p can be multiplied. The result will be an m\times p matrix. We calculate it by taking a row from the left-hand matrix and a column from the right-hand matrix. Since these are the same length (by assumption) we can multiply corresponding elements and sum up.

In the example above, the n\times m matrix \left(t_i^j\right) and the m\times1 matrix \left(v^i\right) can be multiplied. There is only one column in the latter to pick, so we simply choose row j out of n on the left: \begin{pmatrix}t_1^j&t_2^j&\cdots&t_m^j\end{pmatrix}. Multiplying corresponding elements and summing gives the single field element t_i^jv^i (remember the summation convention). We get n of these elements — one for each row — and we arrange them in a new n\times1 matrix:


Then we can multiply the row vector \begin{pmatrix}\mu_1&\mu_2&\cdots&\mu_n\end{pmatrix} by this column vector to get the 1\times1 matrix:


Just like we slip back and forth between vectors and m\times1 matrices, we will usually consider a field element and the 1\times1 matrix with that single entry as being pretty much the same thing.

The first multiplication here turned an m-dimensional (column) vector into an n-dimensional one, reflecting the source and target of the transformation T. Then we evaluated the linear functional \mu on the resulting vector. But by the associativity of matrix multiplication we could have first multiplied on the left:


turning the linear functional on \mathbb{F}^n into one on \mathbb{F}^m. But this is just the dual transformation T^*:\left(\mathbb{F}^n\right)^*\rightarrow\left(\mathbb{F}^m\right)^*! Then we can evaluate this on the column vector to get the same result: \mu_jt_i^jv^j.

There is one slightly touchy thing we need to be careful about: Kronecker products. When the upper index is a pair (i_1,i_2) with 1\leq i_1\leq n_1 and 1\leq i_2\leq n_2 we have to pick an order on the set of such pairs. We’ll always use the “lexicographic” order. That is, we start with (1,1), then (1,2), and so on until (1,n_2) before starting over with (2,1), (2,2), and so on. Let’s write out a couple examples just to be clear:



So the Kronecker product depends on the order of multiplication. But this dependence is somewhat illusory. The only real difference is reordering the bases we use for the tensor products of the vector spaces involved, and so a change of basis can turn one into the other. This is an example of how matrices can carry artifacts of our choice of bases.

May 30, 2008 Posted by | Algebra, Linear Algebra | 7 Comments

Matrices IV

Like we saw with the tensor product of vector spaces, the dual space construction turns out to be a functor. In fact, it’s a contravariant functor. That is, if we have a linear transformation T:U\rightarrow V we get a linear transformation T^*:V^*\rightarrow U^*. As usual, we ask what this looks like for matrices.

First, how do we define the dual transformation? It turns out this is the contravariant functor represented by \mathbb{F}. That is, if \mu:V\rightarrow\mathbb{F} is a linear functional, we define T^*(\mu)=\mu\circ T:U\rightarrow\mathbb{F}. In terms of the action on vectors, \left[T^*(\mu)\right](v)=\mu(T(v))

Now let’s assume that U and V are finite-dimensional, and pick bases \left\{e_i\right\} and \left\{f_k\right\} for U and V, respectively. Then the linear transformation T has matrix coefficients t_i^k. We also get the dual bases \left\{\epsilon^j\right\} of U^* and \left\{\phi^l\right\} of V^*.

Given a basic linear functional \phi^l on V, we want to write T^*(\phi^l) in terms of the \epsilon^j. So let’s evaluate it on a generic basis vector e_i and see what we get. The formula above shows us that


In other words, we can write T^*(\phi^l)=t_j^l\epsilon^j. The same matrix works, but we use its indices differently.

In general, given a linear functional \mu with coefficients \mu_l we find the coefficients of T^*(\mu) as t_j^l\mu_l. The value \left[T^*(\mu)\right](v)=\mu(T(u)) becomes \mu_lt_i^lu^i. Notice that the summation convention tells us this must be a scalar (as we expect) because there are no unpaired indices. Also notice that because we can use the same matrix for two different transformations we seem to have an ambiguity: is the lower index running over a basis for U or one for U^*? Luckily, since every basis gives rise to a dual basis, we don’t need to care. Both spaces have the same dimension anyhow.

May 28, 2008 Posted by | Algebra, Linear Algebra | 2 Comments

Dual Spaces

Another thing vector spaces come with is duals. That is, given a vector space V we have the dual vector space V^*=\hom(V,\mathbb{F}) of “linear functionals” on V — linear functions from V to the base field \mathbb{F}. Again we ask how this looks in terms of bases.

So let’s take a finite-dimensional vector space V with basis \left\{e_i\right\}, and consider some linear functional \mu\in V^*. Like any linear function, we can write down matrix coefficients \mu_i=\mu(e_i). Notice that since our target space (the base field \mathbb{F}) is only one-dimensional, we don’t need another index to count its basis.

Now let’s consider a specially-crafted linear functional. We can define one however we like on the basis vectors e_i and then let linearity handle the rest. So let’s say our functional takes the value {1} on e_1 and the value {0} on every other basis element. We’ll call this linear functional \epsilon^1. Notice that on any vector we have


so it returns the coefficient of e_1. There’s nothing special about e_1 here, though. We can define functionals \epsilon^j by setting \epsilon^j(e_i)=\delta_i^j. This is the “Kronecker delta”, and it has the value {1} when its two indices match, and {0} when they don’t.

Now given a linear functional \mu with matrix coefficients \mu_j, let’s write out a new linear functional \mu_j\epsilon^j. What does this do to basis elements?


so this new transformation has exactly the same matrix as \mu does. It must be the same transformation! So any linear functional can be written uniquely as a linear combination of the \epsilon^j, and thus they form a basis for the dual space. We call \left\{\epsilon^j\right\} the “dual basis” to \left\{e_i\right\}.

Now if we take a generic linear functional \mu and evaluate it on a generic vector v we find


Once we pick a basis for V we immediately get a basis for V^*, and evaluation of a linear functional on a vector looks neat in terms of these bases.

May 27, 2008 Posted by | Algebra, Linear Algebra | 30 Comments

Matrices III

Given two finite-dimensional vector spaces U and V, with bases \left\{e_i\right\} and \left\{f_j\right\} respectively, we know how to build a tensor product: use the basis \left\{e_i\otimes f_j\right\}.

But an important thing about the tensor product is that it’s a functor. That is, if we have linear transformations S:U\rightarrow U' and T:V\rightarrow V', then we get a linear transformation S\otimes T:U\otimes V\rightarrow U'\otimes V'. So what does this operation look like in terms of matrices?

First we have to remember exactly how we get the tensor product S\otimes T. Clearly we can consider the function S\times T:U\times V\rightarrow U'\times V'. Then we can compose with the bilinear function U'\times V'\rightarrow U'\otimes V' to get a bilinear function from U\times V to U'\otimes V'. By the universal property, this must factor uniquely through a linear function U\otimes V\rightarrow U'\otimes V'. It is this map we call S\otimes T.

We have to pick bases \left\{e_k'\right\} of U' and \left\{f_l'\right\} of V'. This gives us a matrix coefficients s_i^k for S and t_j^l for T. To calculate the matrix for S\otimes T we have to evaluate it on the basis elements e_i\otimes f_j of U\otimes V. By definition we find:

\left[S\otimes T\right](e_i\otimes f_j)=S(e_i)\otimes T(f_j)=\left(s_i^ke_k'\right)\otimes\left(t_j^lf_l'\right)=s_i^kt_j^le_k'\otimes f_l'

that is, the matrix coefficient between the index pair (i,j) and the index pair (k,l) is s_i^kt_j^l.

It’s not often taught anymore, but there is a name for this operation: the Kronecker product. If we write the matrices (as opposed to just their coefficients) \left(s_i^k\right) and \left(t_j^l\right), then we write the Kronecker product \left(s_i^k\right)\boxtimes\left(t_j^l\right)=\left(s_i^kt_j^l\right).

May 26, 2008 Posted by | Algebra, Linear Algebra | 6 Comments

Tensor Products and Bases

Since we’re looking at vector spaces, which are special kinds of modules, we know that \mathbf{Vec}(\mathbb{F}) has a tensor product structure. Let’s see what this means when we pick bases.

First off, let’s remember what the tensor product of two vector spaces U and V is. It’s a new vector space U\otimes V and a bilinear (linear in each of two variables separately) function B:U\times V\rightarrow U\otimes V satisfying a certain universal property. Specifically, if F:U\times V\rightarrow W is any bilinear function it must factor uniquely through B as F=\bar{F}\circ B. The catch here is that when we say “linear” and “bilinear” we mean that the functions preserve both addition and scalar multiplication. As with any other universal property, such a tensor product will be uniquely defined up to isomorphism.

So let’s take finite-dimensional vector spaces U and V, and bases \left\{e_i\right\} of U and \left\{f_j\right\} of V. I say that the vector space with basis \left\{e_i\otimes f_j\right\}, and with the bilinear function B(e_i,f_j)=e_i\otimes f_j is a tensor product. Here the expression e_i\otimes f_j is just a name for a basis element of the new vector space. Such elements are indexed by the set of pairs (i,j), where i indexes a basis for U and j indexes a basis for V.

First off, what do I mean by the bilinear function B(e_i,f_j)=e_i\otimes f_j? Just as for linear functions, we can define bilinear functions by defining them on bases. That is, if we have u=u^ie_i\in U and v=v^jf_j\in V, we get the vector

B(u,v)=B(u^ie_i,v^je_j)=u^iv^jB(e_i,f_j)=u^iv^je_i\otimes f_j

in our new vector space, with coefficients u^iv^j.

So let’s take a bilinear function F and define a linear function \bar{F} by setting

\bar{F}(e_i\otimes f_j)=F(e_i,f_j)

We can easily check that F does indeed factor as desired, since

\bar{F}(B(e_i,f_j))=\bar{F}(e_i\otimes f_j)=F(e_i,f_j)

so F=\bar{F}\circ B on basis elements. By linearity, they must agree for all pairs (u,v). It should also be clear that we can’t define \bar{F} any other way and hope to satisfy this equation, so the factorization is unique.

Thus if we have bases \left\{e_i\right\} of U and \left\{f_j\right\} of V, we immediately get a basis \left\{e_i\otimes f_j\right\} of U\otimes V. As a side note, we immediately see that the dimension of the tensor product of two vector spaces is the product of their dimensions.

May 23, 2008 Posted by | Algebra, Linear Algebra | 5 Comments

Matrices II

With the summation convention firmly in hand, we continue our discussion of matrices.

We’ve said before that the category of vector space is enriched over itself. That is, if we have vector spaces U and V over the field \mathbb{F}, the set of linear transformations \hom(U,V) is itself a vector space over \mathbb{F}. In fact, it inherits this structure from the one on V. We define the sum and the scalar product


for linear transformations S and T from U to V, and for a constant c\in\mathbb{F}. Verifying that these are also linear transformations is straightforward.

So what do these structures look like in the language of matrices? If U and V are finite-dimensional, let’s pick bases \left\{e_i\right\} of U and \left\{f_j\right\} of V. Now we get matrix coefficients s_i^j and t_i^j, where i indexes the basis of U and j indexes the basis of V. Now we can calculate the matrices of the sum and scalar product above.

We do this, as usual, by calculating the value the transformations take at each basis element. First, the sum:


and now the scalar product:


so we calculate the matrix coefficients of the sum of two linear transformations by adding the corresponding matrix coefficients of each transformation, and the matrix coefficients of the scalar product by multiplying each coefficient by the same scalar.

May 22, 2008 Posted by | Algebra, Linear Algebra | 1 Comment

The Einstein Summation Convention

Look at the formulas we were using yesterday. There’s a lot of summations in there, and a lot of big sigmas. Those get really tiring to write over and over, and they get tiring really quick. Back when Einstein was writing up his papers, he used a lot of linear transformations, and wrote them all out in matrices. Accordingly, he used a lot of those big sigmas.

When we’re typing nowadays, or when we write on a pad or on the board, this isn’t a problem. But remember that up until very recently, publications had to actually set type. Actual little pieces of metal with characters raised (and reversed!) on them would get slathered with ink and pressed to paper. Incidentally, this is why companies that produce fonts are called “type foundries”. They actually forged those metal bits with letter shapes in different styles, and sold sets of them to printers.

Now Einstein was using a lot of these big sigmas, and there were pages that had so many of them that the printer would run out! Even if they set one page at once and printed them off, they just didn’t have enough little pieces of metal with big sigmas on them to handle it. Clearly something needed to be done to cut down on demand for them.

Here we note that we’re always summing over some basis. Even if there’s no basis element in a formula — say, the formula for a matrix product — the summation is over the dimension of some vector space. We also notice that when we chose to write some of our indices as subscripts and some as superscripts, we’re always summing over one of each. We now adopt the convention that if we ever see a repeated index — once as a superscript and once as a subscript — we’ll read that as summing over an appropriate basis.

For example, when we wanted to write a vector v\in V, we had to take the basis \{f_j\} of V and write up the sum

\displaystyle v=\sum\limits_{j=1}^{\dim(V)}v^jf_j

but now we just write v^jf_j. The repeated index and the fact that we’re talking about a vector in V means we sum for j running from {1} to the dimension of V. Similarly we write out the value of a linear transformation on a basis vector: T(f_j)=t_j^kg_k. Here we determine from context that k should run from {1} to the dimension of W.

What about finding the coefficients of a linear transformation acting on a vector? Before we wrote this as

\displaystyle T(v)^k=\sum\limits_{j=1}^{\dim(V)}t_j^kv^j

Where now we write the result as t_j^kv^j. Since the v^j are the coefficients of a vector in V, j must run from {1} to the dimension of V.

And similarly given linear transformations S:U\rightarrow V and T:V\rightarrow W represented (given choices of bases) by the matrices with components s_i^j and t_j^k, the matrix of their product is then written s_i^jt_j^k. Again, we determine from context that we should be summing j over a set indexing a basis for V.

One very important thing to note here is that it’s not going to matter what basis for V we use here! I’m not going to prove this quite yet, but built right into this notation is the fact that the composite of the two transformations is completely independent of the choice of basis of V. Of course, the matrix of the composite still depends on the bases of U and W we pick, but the dependence on V vanishes as we take the sum.

Einstein had a slightly easier time of things: he was always dealing with four-dimensional vector spaces, so all his indices had the same range of summation. We’ve got to pay some attention here and be careful about what vector space a given index is talking about, but in the long run it saves a lot of time.

May 21, 2008 Posted by | Fundamentals, Linear Algebra | 18 Comments

Matrices I

Yesterday we talked about the high-level views of linear algebra. That is, we’re discussing the category \mathbf{Vec}(\mathbb{F}) of vector spaces over a field \mathbb{F} and \mathbb{F}-linear transformations between them.

More concretely, now: we know that every vector space over \mathbb{F} is free as a module over \mathbb{F}. That is, every vector space has a basis — a set of vectors so that every other vector can be uniquely written as an \mathbb{F}-linear combination of them — though a basis is far from unique. Just how nonunique it is will be one of our subjects going forward.

Now if we’ve got a linear transformation T:V\rightarrow W from one finite-dimensional vector space to another, and if we have a basis \{f_j\}_{j=1}^{\mathrm{dim}(V)} of V and a basis \{g_k\}_{k=1}^{\mathrm{dim}(W)} of W, we can use these to write the transformation T in a particular form: as a matrix. Take the transformation and apply it to each basis element of V to get vectors T(f_j)\in W. These can be written uniquely as linear combinations

\displaystyle T(f_j)=\sum\limits_{k=1}^{\mathrm{dim}(W)}t_j^kg_k

for certain t_j^k\in\mathbb{F}. These coefficients, collected together, we call a matrix. They’re enough to calculate the value of the transformation on any vector v\in V, because we can write

\displaystyle v=\sum\limits_{j=1}^{\mathrm{dim}(V)}v^jf_j

We’re writing the indices of the components as superscripts here, just go with it. Then we can evaluate T(v) using linearity

\displaystyle T(v)=T\left(\sum\limits_{j=1}^{\mathrm{dim}(V)}v^jf_j\right)=\sum\limits_{j=1}^{\mathrm{dim}(V)}v^jT(f_j)=

So the coefficients v^j defining the vector v\in V and the matrix coefficients t_j^k together give us the coefficients defining the vector T(v)\in W.

If we have another finite-dimensional vector space U with basis \{e_i\}_{i=1}^{\mathrm{dim}(U)} and another transformation S:U\rightarrow V then we have another matrix

\displaystyle S(e_i)=\sum_{j=1}^{\mathrm{dim}(V)}s_i^jf_j

Now we can compose these two transformations and calculate the result on a basis element

\displaystyle \left[T\circ S\right](e_i)=T\left(S(e_i)\right)=T\left(\sum_{j=1}^{\mathrm{dim}(V)}s_i^jf_j\right)=\sum_{j=1}^{\mathrm{dim}(V)}s_i^jT(f_j)=

This last quantity in parens is then the matrix of the composite transformation T\circ S. Thus we can represent the operation of composition by this formula for matrix multiplication.

May 20, 2008 Posted by | Algebra, Linear Algebra | 7 Comments

Linear Algebra

Here we begin a discussion of linear algebra. There are three views on what this is all about.

The mid-level view is that we’re studying the properties of linear maps — homomorphisms — between abelian groups, and particularly between modules or vector spaces, which are just modules over a field. In particular we’ll focus on vector spaces over some arbitrary (but fixed!) field \mathbb{F}.

The high-level view is that what we’re really studying is the category of \mathbb{F}-modules. Categories are all about how morphisms between their objects interact, so this is what we’re really after here. And it turns out we already know a lot about these sorts of categories! Specifically, they’re abelian categories. In fact, since we’re working over a field \mathbb{F} (which is a commutative ring) the properties of \hom-functors tell us that \mathbb{F}-\mathbf{mod} is enriched over itself.

So this tells us that our category of vector spaces has a biproduct — the direct sum — and in particular a zero object — the trivial {0}-dimensional vector space \mathbf{0}. It also has a tensor product, which makes this a monoidal category, using the one-dimensional vector space \mathbb{F} itself as monoidal identity. We also know that kernels and cokernels exist, which then (along with biproducts) give us all finite limits and colimits.

The third viewpoint is that we’re talking about solving systems of linear equations, and that’s where “linear algebra” comes from. The connection between these abstract formulations and that concrete one is a bit mysterious at first blush, but we’ll start making it tomorrow.

May 19, 2008 Posted by | Algebra, Linear Algebra | 4 Comments

Commutativity in Series III

Okay, here’s the part I promised I’d finish last Friday. How do we deal with rearrangements that “go to infinity” more than once? That is, we chop up the infinite set of natural numbers into a bunch of other infinite sets, add each of these subseries up, and then add the results up. If the original series was absolutely convergent, we’ll get the same answer.

First of all, if a series \sum_{k=0}^\infty a_k converges absolutely, then so does any subseries \sum_{j=0}^\infty a_{p(j)}, where p is an injective (but not necessarily bijective!) function from the natural numbers to themselves. For instance, we could let p(j)=2j and add up all the even terms from the original series.

To see this, notice that at any finite n we have a maximum value N=\max\limits_{0\leq j\leq n}p(j). Then we find


So the new sequence of partial sums of absolute values is increasing and bounded above, and thus converges.

Now let’s let p_0, p_1, p_2, and so on be a countable collection of functions defined on the natural numbers. We ask that

  • Each p_n is injective.
  • The image of p_n is a subset P_k\subseteq\mathbb{N}.
  • The collection \left\{P_0,P_1,P_2,...\right\} is a partition of \mathbb{N}. That is, these subsets are mutually disjoint, and their union is all of \mathbb{N}.

If \sum_{k=0}^\infty a_k is an absolutely convergent series, we define \left(b_n\right)_j=a_{p_n(j)} — the subseries defined by p_n. Then from what we said above, each \sum_{j=0}^\infty\left(b_n\right)_j is an absolutely convergent series whose sum we call s_n. We assert now that \sum_{n=0}^\infty s_n is an absolutely convergent series whose sum is the same as that of \sum_{k=0}^\infty a_k.

Let’s set t_m=\sum_{n=0}^m\left|s_n\right|. That is, we have

\displaystyle t_m\leq\sum\limits_{j=0}^\infty\left|\left(b_1\right)_j\right|+...+\sum\limits_{j=0}^\infty\left|\left(b_m\right)_j\right|=\sum\limits_{j=0}^\infty\left(\left|\left(b_1\right)_j\right|+...+\left|\left(b_m\right)_j\right|\right)

But this is just the sum of a bunch of absolute values from the original series, and so is bounded by \sum_{k=0}^\infty\left|a_k\right|. So the series of absolute values of s_n has bounded partial sums, and so \sum_{n=0}^\infty s_n converges absolutely. That it has the same sum as the original is another argument exactly analogous to (but more complicated than) the one for a simple rearrangement, and for associativity of absolutely convergent series.

This pretty much wraps up all I want to say about calculus for now. I’m going to take a little time to regroup before I dive into linear algebra in more detail than the abstract algebra I covered before. But if you want to get ahead, go back and look over what I said about rings and modules. A lot of that will be revisited and fleshed out in the next sections.

May 12, 2008 Posted by | Analysis, Calculus | 3 Comments