The Unapologetic Mathematician

Mathematics for the interested outsider

Matrix Elements

Okay, back to linear algebra and inner product spaces. I want to look at the matrix of a linear map between finite-dimensional inner product spaces.

So, let’s say V and W are inner product spaces with orthonormal bases \left\{e_i\right\}_{i=1}^m and \left\{f_j\right\}_{j=1}^n, respectively, and let T:V\rightarrow W be a linear map from one to the other. We know that we can write down the matrix \left(t_i^j\right) for T, where the matrix entries are defined as the coefficients in the expansion

\displaystyle T(e_i)=t_i^kf_k

But now that we’ve got an inner product on W, it will be easy to extract these coefficients. Just consider the inner product

\displaystyle\begin{aligned}\left\langle f_j,T(e_i)\right\rangle&=\left\langle f_j,t_i^kf_k\right\rangle\\&=t_i^k\left\langle f_j,f_k\right\rangle\\&=t_i^k\delta_{jk}\\&=t_i^j\end{aligned}

Presto! We have a nice, neat function that takes a linear map T and gives us back the ij entry in its matrix — with respect to the appropriate bases, naturally.

But this is also the root of a subtle, but important, shift in understanding what a matrix entry actually is. Up until now, we’ve thought of matrix entries as artifacts which happen to be useful for calculations. But now we’re very explicitly looking at the question “what scalar shows up in this slot of the matrix of a linear map with respect to these particular bases?” as a function. In fact, t_i^j is now not just some scalar value peculiar to the transformation at hand; it’s now a particular linear functional on the space of all transformations \hom(V,W).

And, really, what do the indices i and j matter? If we rearranged the bases we’d find the same function in a new place in the new array. We could have taken this perspective before, with any vector space, but what we couldn’t have asked before is this more general question: “Given a vector v\in V and a vector w\in W, how much does the image T(v) is made up of w?” This new question only asks about these two particular vectors, and doesn’t care anything about any of the other basis vectors that may (or may not!) be floating around. But in the context of an inner product space, this question has an answer:

\displaystyle\left\langle w,T(v)\right\rangle

Any function of this form we’ll call a “matrix element”. We can use such matrix elements to probe linear transformations T even without full bases to work with, sort of like the way we generalized “elements” of an abelian group to “members” of an object in an abelian category. This is especially useful when we move to the infinite-dimensional context and might find it hard to come up with a proper basis to make a matrix with. Instead, we can work with the collection of all matrix elements and use it in arguments in place of some particular collection of matrix elements which happen to come from particular bases.

Now it would be really neat if matrix elements themselves formed a vector space, but the situation’s sort of like when we constructed tensor products. Matrix elements are like the “pure” tensors v\otimes w\in V\otimes W. They (far more than) span the space \hom(V,W)^* of all linear functionals on the space of linear transformations, just like pure tensors span the whole tensor product space. But almost all linear functionals have to be written as a nontrivial sum of matrix elements — they usually can’t be written with just one. Still, since they span we know that many properties which hold for all matrix elements will immediately hold for all linear functionals on T.

May 29, 2009 Posted by | Algebra, Linear Algebra | 2 Comments

Complex Numbers and Polar Coordinates

Forgot to hit “publish” earlier…

So we’ve seen that the unit complex numbers can be written in the form e^{i\theta} where \theta denotes the (signed) angle between the point on the circle and 1+0i. We’ve also seen that this view behaves particularly nicely with respect to multiplication: multiplying two unit complex numbers just adds their angles. Today I want to extend this viewpoint to the whole complex plane.

If we start with any nonzero complex number z=a+bi, we can find its absolute value \lvert z\rvert=\sqrt{a^2+b^2}. This is a positive real number which we’ll also call r. We can factor this out of z to find z=r\left(\frac{a}{r}+\frac{b}{r}i\right). The complex number in parentheses has unit absolute value, and so we can write it as e^{i\theta} for some \theta between -\pi and \pi. Thus we’ve written our complex number in the form

\displaystyle z=re^{i\theta}

where the positive real number r is the absolute value of z, and \theta — a real number in the range \left(-\pi,\pi\right] — is the angle z makes with the reference point 1+0i. But this is exactly how we define the polar coordinates (r,\theta) back in high school math courses.

Just like we saw for unit complex numbers, this notation is very well behaved with respect to multiplication. Given complex numbers r_1e^{i\theta_1} and r_2e^{i\theta_2} we calculate their product:

\displaystyle r_1e^{i\theta_1}r_2e^{i\theta_2}=\left(r_1r_2\right)e^{i\left(\theta_1+\theta_2\right)}

That is, we multiply their lengths (as we already knew) and add their angles, just like before. This viewpoint also makes division simple:


In particular we see that


so multiplicative inverses are given in terms of complex conjugates and magnitudes as we already knew.

Powers (including roots) are also easy, which gives rise to easy ways to remember all those messy double- and triple-angle formulæ from trigonometry:



Other angle addition formulæ should be similarly easy to verify from this point.

In general, since we consider complex numbers multiplicatively so often it will be convenient to have this polar representation of complex numbers at hand. It will also generalize nicely, as we will see.

May 29, 2009 Posted by | Fundamentals, Numbers | 3 Comments

The Circle Group

Yesterday we saw that the unit-length complex numbers are all of the form e^{i\theta}, where \theta measures the oriented angle from 1+0i around to the point in question. Since the absolute value of a complex number is multiplicative, we know that the product of two unit-length complex numbers is again of unit length. We can also see this using the exponential property:

\displaystyle e^{i\theta_1}e^{i\theta_2}=e^{i(\theta_1+\theta_2)}

So multiplying two unit-length complex numbers corresponds to adding their angles.

That is, the complex numbers on the unit circle form a group under multiplication of complex numbers — a subgroup of the multiplicative group of the complex field — and we even have an algebraic description of this group. The function sending the real number \theta to the point on the circle e^{i\theta} is a homomorphism from the additive group of real numbers to the circle group. Since every point on the circle has such a representative, it’s an epimorphism. What is the kernel? It’s the collection of real numbers satisfying

\displaystyle e^{i\theta}=\cos(\theta)+i\sin(\theta)=1+0i

that is, \theta must be an integral multiple of 2\pi — an element of the subgroup 2\pi\mathbb{Z}\subseteq\mathbb{R}. So, algebraically, the circle group is the quotient \mathbb{R}/(2\pi\mathbb{Z}). Or, isomorphically, we can just write \mathbb{R}/\mathbb{Z}.

Something important has happened here. We have in hand two distinct descriptions of the circle. One we get by putting the unit-length condition on points in the plane. The other we get by taking the real line and “wrapping” it around itself periodically. I haven’t really mentioned the topologies, but the first approach inherits the subspace topology from the topology on the complex numbers, while the second approach inherits the quotient topology from the topology on the real numbers. And it turns out that the identity map from one version of the circle to the other one is actually a homeomorphism, which further shows that the two descriptions give us “the same” result.

What’s really different between the two cases is how they generalize. I’ll probably come back to these in more detail later, but for now I’ll point out that the first approach generalizes to spheres in higher dimensions, while the second generalizes to higher-dimensional tori. Thus the circle is sometimes called the one-dimensional sphere S^1, and sometimes called the one-dimensional torus T^1, and each one calls to mind a slightly different vision of the same basic object of study.

May 27, 2009 Posted by | Algebra, Fundamentals, Group theory, Numbers | 3 Comments

Complex Numbers and the Unit Circle

When I first talked about complex numbers there was one perspective I put off, and now need to come back to. It makes deep use of Euler’s formula, which ties exponentials and trigonometric functions together in the relation

\displaystyle e^{i\theta}=\cos(\theta)+i\sin(\theta)

where we’ve written e for \exp(1) and used the exponential property.

Remember that we have a natural basis for the complex numbers as a vector space over the reals: \left\{1,i\right\}. If we ask that this natural basis be orthonormal, we get a real inner product on complex numbers, which in turn gives us lengths and angles. In fact, this notion of length is exactly that which we used to define the absolute value of a complex number, in order to get a topology on the field.

So what happens when we look at e^{i\theta}? First, we can calculate its length using this inner product, getting

\displaystyle\left\lvert e^{i\theta}\right\rvert=\cos(\theta)^2+\sin(\theta)^2=1

by the famous trigonometric identity. That is, every complex number of the form e^{i\theta} lies a unit distance from the complex number {0}.

In particular, 1+0i=e^{0i} is a nice reference point among such points. We can use it as a fixed post in the complex plane, and measure the angle it makes with any other point. For example, we can calculate the inner product


and thus we find that the point e^{i\theta} makes an angle \lvert\theta\rvert with our fixed post {1}, at least for -\pi\leq\theta\leq\pi. We see that e^{i\theta} traces a circle by increasing the angle in one direction as \theta increases from {0} to \pi, and increasing the angle in the other direction as \theta decreases from {0} to -\pi. For values of \theta outside this range, we can use the fact that

\displaystyle e^{2\pi i}=\cos(2\pi)+i\sin(2\pi)=1+0i

to see that the function e^{i\theta} is periodic with period 2\pi. That is, we can add or subtract whatever multiple of 2\pi we need to move \theta within the range -\pi<\theta\leq\pi. Thus, as \theta varies the point e^{i\theta} traces out a circle of unit radius, going around and around with period 2\pi, and every point on the unit circle has a unique representative of this form with \theta in the given range.

May 26, 2009 Posted by | Fundamentals, Numbers | 3 Comments

Properties of Adjoints

Many of the properties of the adjoint construction follow immediately from the contravariant functoriality of the duality we used in its construction. But they can also be determined from the adjoint relation

\displaystyle\left\langle T(v),w\right\rangle_W=\left\langle v,T^*(w)\right\rangle_V

For example, if we have transformations S:U\rightarrow V and T:V\rightarrow W, then the adjoint of their composite is the composite of their adjoints in the opposite order: (TS)^*=S^*T^*. To check this, we write

\displaystyle\begin{aligned}\left\langle\left[TS\right](u),w\right\rangle_W&=\left\langle T\left(S(u)\right),w\right\rangle_W\\&=\left\langle S(u),T^*(w)\right\rangle_V\\&=\left\langle u,S^*\left(T^*(w)\right)\right\rangle_U\\&=\left\langle u,\left[S^*T^*\right](w)\right\rangle_W\end{aligned}

It’s pretty straightforward to see that I_V^*=I_V. Then, since T^{-1}T=I_V and TT^{-1}=I_W, we find that T^*(T^{-1})^*=I_V and (T^{-1})^*T^*=I_W, which shows that (T^{-1})^*=(T^*)^{-1}.

Similarly, it’s easy to show that (A+B)^*=A^*+B^*. But the process isn’t quite linear. When we work over the complex numbers, we find that (\lambda T)^*=\bar{\lambda}T^*:

\displaystyle\begin{aligned}\left\langle\left[\lambda T\right](v),w\right\rangle_W&=\left\langle T(\lambda v),w\right\rangle_W\\&=\left\langle\lambda v,T^*(w)\right\rangle_V\\&=\left\langle v,\bar{\lambda}T^*(w)\right\rangle_V\\&=\left\langle v,\left[\bar{\lambda}T^*\right](w)\right\rangle_V\end{aligned}

Now if we restrict our focus to endomorphisms of a single vector space V, we see that the adjoint construction gives us an involutory (since T^{**}=T), semilinear (since it applies the complex conjugate to scalar multiples) antiautomorphism of the algebra of endomorphisms of V. That is, it’s like an automorphism, except it reverses the order of multiplication.

In a way, then, the adjoint behaves sort of like the complex conjugate itself does for the algebra of complex numbers (over the complex numbers we don’t notice the order of multiplication, but work with me here, people). This analogy goes pretty far, as we’ll see.

May 25, 2009 Posted by | Algebra, Linear Algebra | 2 Comments

Adjoint Transformations

Since an inner product on a finite-dimensional vector space V is a bilinear form, it provides two isomorphisms from V to its dual V^*. And since an inner product is a symmetric bilinear form, these two isomorphisms are identical. But since duality is a (contravariant) functor, we have a dual transformation T^*:W^*\rightarrow V^* for every linear transformation T:V\rightarrow W. So what happens when we put these two together?

Say we start with linear transformation T:V\rightarrow W. We’ll build up a transformation from W to V which we’ll call the “adjoint” to T. First we have the isomorphism W\rightarrow W^*. Then we follow this with the dual transformation T^*:W^*\rightarrow V^*. Finally, we use the isomorphism V^*\rightarrow V. We’ll write T^* for this composite, and rely on context to tell us whether we mean the dual or the adjoint (but because of the isomorphisms they’re secretly the same thing).

So why is this the adjoint? Let’s say we have vectors v\in V and w\in W. Then it turns out that

\displaystyle\left\langle T(v),w\right\rangle_W=\left\langle v,T^*(w)\right\rangle_V

which should recall the relation between two adjoint functors. An important difference here is that there is no distinction between left- and right-adjoint transformations. The adjoint of an adjoint is the original transformation back again: \left(T^*\right)^*=T. This follows if we use the symmetry of the inner products on the relation above

\displaystyle\left\langle w,T(v)\right\rangle_W=\left\langle T^*(w),v\right\rangle_V=\left\langle w,T^{**}(v)\right\rangle_W

Then since \left\langle w,\left[T-T^{**}\right](v)\right\rangle_W=0 and the inner product on W is nondegenerate, we must have T-T^{**} sending every v to the zero vector in W. Thus T=T^{**}.

So let’s show this adjoint condition in the first place. On the left side, we have the result of applying the linear functional \langle\underline{\hphantom{X}},w\rangle_W\in W^* to the vector T(V). But this linear functional is simply the image of the vector w under the isomorphism W\rightarrow W^*. So on the left, we’ve calculated the result of first applying T to v, and then applying this linear functional.

But the way we defined the dual transformation was such that we can instead apply the dual T^* to the linear functional \langle\underline{\hphantom{X}},w\rangle_W, and then apply the resulting functional to v, and we’ll get the same result. And the isomorphism V^*\rightarrow V tells us that there is some vector in v_1\in V so that the linear functional we’re now applying to v is \langle\underline{\hphantom{X}},v_1\rangle_V. That is, our value will be \langle v,v_1\rangle_V for some vector v_1. Which one? The one we defined as T^*(w).

I’ll admit that it sometimes takes a little getting used to the way adjoints and duals are the same, and also the subtleties of how they’re distinct. But it sinks in soon enough.

May 22, 2009 Posted by | Algebra, Linear Algebra | 7 Comments


If all goes according to plan, a link to this post should show up on my Twitter feed shortly. Thanks to twitterfeed.

May 22, 2009 Posted by | Uncategorized | Leave a comment

I’m a Twit

So now what?

May 22, 2009 Posted by | Uncategorized | 3 Comments

ARML Scrimmage Power Question

I helped the Howard County and Baltimore County ARML teams practice tonight by joining the group of local citizens and team alumni to field a scrimmage team. As usual, my favorite part is the power question. It follows, as printed, but less the (unnecessary) diagrams:

Consider the function


which maps the real number t to the a coordinate in the xy plane. Assume throughout that q, r, s, t, and u are real numbers.

(1) Compute \phi(1), \phi(1/2), \phi(2), \phi(-1), \phi(-1/2), and \phi(-2). Sketch a plot of these points, superimposed on the unit circle.

(2) Show that \phi is one-to-one. That is, show that if \phi(s)=\phi(t), then s=t.

(3) Let (x_\phi,y_\phi) be the intersection point between the unit circle and the line connecting (0,1) and (t,0). Prove that \phi(t)=(x_\phi,y_\phi).

(4) Show that (x,y) is an ordered pair of rational numbers on the unit circle different from (0,1) if and only if there is a rational number t such that \phi(t)=(x,y). (This result allows us to deduce that there are infinitely (countably) many rational points on the unit circle.)

According to problem 3, \phi(t) is a particular geometric mapping of a single point on the real line to the unit circle. Now, we will be concerned with the relationship between the pairs of points, which will lead to a way of doing arithmetic by geometry. Use these definitions:

  • Let \left\{\phi(s),\phi(t)\right\} be a “vertical pair” if either s=t=1, or s=t=-1, or st\neq0 and \phi(s) and latex \phi(t)$ are two different points on the same vertical line.
  • Let \left\{\phi(s),\phi(t)\right\} be a “horizontal pair” if either s=t=0, or \phi(s) and \phi(t) are two different points on the same horizontal line.
  • Let \left\{\phi(s),\phi(t)\right\} be a “diametric pair” if \phi(s) and \phi(t) are two different end points of the same diameter of the circle.

(5) (a) Prove that for all s and t, \left\{\phi(s),\phi(t)\right\} is a vertical pair if and only if st=1.
(b) Prove that for all s and t, \left\{\phi(s),\phi(t)\right\} is a horizontal pair if and only if s=-t.
(c) Determine and prove a relationship between s and t that is a necessary and sufficient condition for \left\{\phi(s),\phi(t)\right\} to be a diametric pair.

(6) (a) Suppose that \left\{\phi(s),\phi(t)\right\} is not a vertical pair. Then, the straight line through them (if \phi(s)=\phi(t), use the tangent line to the circle at that point) intersects the y-axis at the point (0,b). Find b in terms of s and t, and simplify and prove your answer.
(b) Draw the straight line through the point (1,0) and (0,b), where (0,b) is the point described in problem (5a). Let \phi(u) denote the point of intersection of this line and the circle. Prove that u=st.

(7) (a) Suppose that \left\{\phi(s),\phi(t)\right\} is not a horizontal pair. Then, the straight line through them (if \phi(s)=\phi(t), use the tangent line to the circle at that point) intersects the horizontal line y=1 at the point (a,1). Find a in terms of s and t, and simplify and prove your answer.
(b) Draw the straight line through the point (0,-1) and (a,1), where (a,1) is the point described in problem (6a). Let \phi(u) denote the point of intersection of this line and the circle. Prove that u=s+t.

(8) Suppose q, r, s, and t are distinct real numbers such that qr=st and such that the line containing \phi(q) and \phi(s) intersects the line containing \phi(r) and \phi(t). Find the y-coordinate of the intersection point in terms of s and t only.

(9) Let s and t be distinct real numbers such that st>0. Given only the unit circle, the x– and y– axes, the points \phi(s) and \phi(t), and a straitedge (but no compass), determine a method to construct the point \phi(\sqrt{st}) that uses no more than 5 line segments. Prove why the construction works and provide a sketch.

(10) Given only the unit circle, the x-and y– axes, the point (1,1), and a straightedge (but no compass), describe a method to construct the point \left(-\frac{2\sqrt{3}}{3},0\right).

May 21, 2009 Posted by | Uncategorized | 2 Comments

Orthogonal Complementation is a Galois Connection

We now know how to take orthogonal complements of subspaces in an inner product space. It turns out that this process (and itself again) forms an antitone Galois connection.

Let’s just quickly verify the condition. We need to show that if U and W are subspaces of an inner-product space V, then U\subseteq W^\perp if and only if W\subseteq U^\perp. Clearly the symmetry of the situation shows us that we only need to check one direction. So if U\subseteq W^\perp, we know that W^{\perp\perp}\subseteq U, and also that W\subseteq W^{\perp\perp}. And thus we see that W\subseteq U^\perp.

So what does this tell us? First of all, it gives us a closure operator — the double orthogonal complement. It also gives a sense of a “closed” subspace — we say that U is closed if U^{\perp\perp}=U.

But didn’t we know that U^{\perp\perp}=U? No, that only held for finite-dimensional vector spaces. This now holds for all vector spaces. So if we have an infinite-dimensional vector space its lattice of subspaces may not be orthocomplemented. But its lattice of closed subspaces will be! So if we want to use an infinite-dimensional vector space to build up some analogue of classical logic, we might be able to make it work after all.}

May 19, 2009 Posted by | Algebra, Linear Algebra | Leave a comment