# The Unapologetic Mathematician

## Orthonormal Bases

Now that we have the Gram-Schmidt process as a tool, we can use it to come up with orthonormal bases.

Any vector space $V$ with finite dimension $d$ has a finite basis $\left\{v_i\right\}_{i=1}^d$. This is exactly what it means for $V$ to have dimension $d$. And now we can apply the Gram-Schmidt process to turn this basis into an orthonormal basis $\left\{e_i\right\}_{i=1}^d$.

We also know that any linearly independent set can be expanded to a basis. In fact, we can also extend any orthonormal collection of vectors to an orthonormal basis. Indeed, if $\left\{e_i\right\}_{i=1}^n$ is an orthonormal collection, we can add the vectors $\left\{v_i\right\}_{i=n+1}^d$ to fill out a basis. Then when we apply the Gram-Schmidt process to this basis it will start with $e_1$, which is already normalized. It then moves on to $e_2$, which is orthonormal with $e_1$, and so on. Each of the $e_i$ is left unchanged, and the $v_i$ are modified to make them orthonormal with the existing collection.

April 30, 2009 Posted by | Algebra, Linear Algebra | 5 Comments

## The Gram-Schmidt Process

Now that we have a real or complex inner product, we have notions of length and angle. This lets us define what it means for a collection of vectors to be “orthonormal”: each pair of distinct vectors is perpendicular, and each vector has unit length. In formulas, we say that the collection $\left\{e_i\right\}_{i=1}^n$ is orthonormal if $\langle e_i,e_j\rangle=\delta_{i,j}$. These can be useful things to have, but how do we get our hands on them?

It turns out that if we have a linearly independent collection of vectors $\left\{v_i\right\}_{i=1}^n$ then we can come up with an orthonormal collection $\left\{e_i\right\}_{i=1}^n$ spanning the same subspace of $V$. Even better, we can pick it so that the first $k$ vectors $\left\{e_i\right\}_{i=1}^k$ span the same subspace as $\left\{v_i\right\}_{i=1}^k$. The method goes back to Laplace and Cauchy, but gets its name from Jørgen Gram and Erhard Schmidt.

We proceed by induction on the number of vectors in the collection. If $n=1$, then we simply set

$\displaystyle e_1=\frac{v_1}{\lVert v_1\rVert}$

This “normalizes” the vector to have unit length, but doesn’t change its direction. It spans the same one-dimensional subspace, and since it’s alone it forms an orthonormal collection.

Now, lets assume the procedure works for collections of size $n-1$ and start out with a linearly independent collection of $n$ vectors. First, we can orthonormalize the first $n-1$ vectors using our inductive hypothesis. This gives a collection $\left\{e_i\right\}_{i=1}^{n-1}$ which spans the same subspace as $\left\{v_i\right\}_{i=1}^{n-1}$ (and so on down, as noted above). But $v_n$ isn’t in the subspace spanned by the first $n-1$ vectors (or else the original collection wouldn’t have been linearly independent). So it points at least somewhat in a new direction.

To find this new direction, we define

$\displaystyle w_n=v_n-\langle e_1,v_n\rangle e_1-...-\langle e_{n-1},v_n\rangle e_{n-1}$

This vector will be orthogonal to all the vectors from $e_1$ to $e_{n-1}$, since for any such $e_j$ we can check

\displaystyle\begin{aligned}\langle e_j,w_n&=\langle e_j,v_n-\langle e_1,v_n\rangle e_1-...-\langle e_{n-1},v_n\rangle e_{n-1}\rangle\\&=\langle e_j,v_n\rangle-\langle e_1,v_n\rangle\langle e_j,e_1\rangle-...-\langle e_{n-1},v_n\rangle\langle e_j,e_{n-1}\rangle\\&=\langle e_j,v_n\rangle-\langle e_j,v_n\rangle=0\end{aligned}

where we use the orthonormality of the collection $\left\{e_i\right\}_{i=1}^{n-1}$ to show that most of these inner products come out to be zero.

So we’ve got a vector orthogonal to all the ones we collected so far, but it might not have unit length. So we normalize it:

$\displaystyle e_n=\frac{w_n}{\lVert w_n\rVert}$

and we’re done.

April 28, 2009 Posted by | Algebra, Linear Algebra | 37 Comments

## The Parallelogram Law

There’s an interesting little identity that holds for norms — translation-invariant metrics on vector spaces over $\mathbb{R}$ or ${C}$ — that come from inner products. Even more interestingly, it actually characterizes such norms.

Geometrically, if we have a parallelogram whose two sides from the same point are given by the vectors $v$ and $w$, then we can construct the two diagonals $v+w$ and $v-w$. It then turns out that the sum of the squares on all four sides is equal to the sum of the squares on the diagonals. We write this formally by saying

$\displaystyle\lVert v+w\rVert^2+\lVert v-w\rVert^2=2\lVert v\rVert^2+2\lVert w\rVert^2$

where we’ve used the fact that opposite sides of a parallelogram have the same length. Verifying this identity is straightforward, using the definition of the norm-squared:

\displaystyle\begin{aligned}\lVert v+w\rVert^2+\lVert v-w\rVert^2&=\langle v+w,v+w\rangle+\langle v-w,v-w\rangle\\&=\langle v,v\rangle+\langle v,w\rangle+\langle w,v\rangle+\langle w,w\rangle\\&+\langle v,v\rangle-\langle v,w\rangle-\langle w,v\rangle+\langle w,w\rangle\\&=2\langle v,v\rangle+2\langle w,w\rangle\\&=2\lVert v\rVert^2+2\lVert w\rVert^2\end{aligned}

On the other hand, what if we have a norm that satisfies this parallelogram law? Then we can use the polarization identities to define a unique inner product.

$\displaystyle\langle v,w\rangle=\frac{\lVert v+w\rVert^2-\lVert v-w\rVert^2}{4}+i\frac{\lVert v-iw\rVert^2-\lVert v+iw\rVert^2}{4}$

where we ignore the second term when working over real vector spaces.

However, if we have a norm that does not satisfy the parallelogram law and try to use it in these formulas, then the resulting form must fail to be an inner product. If we did get an inner product, then the norm would satisfy the parallelogram law, which it doesn’t.

Now, I haven’t given any examples of norms on vector spaces which don’t satisfy the parallelogram law, but they show up all the time in functional analysis. For now I just want to point out that such things do, in fact, exist.

April 24, 2009 Posted by | Algebra, Linear Algebra | 4 Comments

## The Polarization Identities

If we have an inner product on a real or complex vector space, we get a notion of length called a “norm”. It turns out that the norm completely determines the inner product.

Let’s take the sum of two vectors $v$ and $w$. We can calculate its norm-squared as usual:

\displaystyle\begin{aligned}\lVert v+w\rVert^2&=\langle v+w,v+w\rangle\\&=\langle v,v\rangle+\langle v,w\rangle+\langle w,v\rangle+\langle w,w\rangle\\&=\lVert v\rVert^2+\lVert w\rVert^2+\langle v,w\rangle+\overline{\langle v,w\rangle}\\&=\lVert v\rVert^2+\lVert w\rVert^2+2\Re\left(\langle v,w\rangle\right)\end{aligned}

where $\Re(z)$ denotes the real part of the complex number $z$. If $z$ is already a real number, it does nothing.

So we can rewrite this equation as

$\displaystyle\Re\left(\langle v,w\rangle\right)=\frac{1}{2}\left(\lVert v+w\rVert^2-\lVert v\rVert^2-\lVert w\rVert^2\right)$

If we’re working over a real vector space, this is the inner product itself. Over a complex vector space, this only gives us the real part of the inner product. But all is not lost! We can also work out

\displaystyle\begin{aligned}\lVert v+iw\rVert^2&=\langle v+iw,v+iw\rangle\\&=\langle v,v\rangle+\langle v,iw\rangle+\langle iw,v\rangle+\langle iw,iw\rangle\\&=\lVert v\rVert^2+\lVert iw\rVert^2+\langle v,iw\rangle+\overline{\langle v,iw\rangle}\\&=\lVert v\rVert^2+\lVert w\rVert^2+2\Re\left(i\langle v,w\rangle\right)\\&=\lVert v\rVert^2+\lVert w\rVert^2-2\Im\left(\langle v,w\rangle\right)\end{aligned}

where $\Im(z)$ denotes the imaginary part of the complex number $z$. The last equality holds because

$\displaystyle\Re\left(i(a+bi)\right)=\Re(ai-b)=-b=-\Im(a+bi)$

so we can write

$\displaystyle\Im\left(\langle v,w\rangle\right)=\frac{1}{2}\left(\lVert v\rVert^2+\lVert w\rVert^2-\lVert v+iw\rVert^2\right)$

We can also write these identities out in a couple other ways. If we started with $v-w$, we could find the identities

$\displaystyle\Re\left(\langle v,w\rangle\right)=\frac{1}{2}\left(\lVert v\rVert^2+\lVert w\rVert^2-\lVert v-w\rVert^2\right)$
$\displaystyle\Im\left(\langle v,w\rangle\right)=\frac{1}{2}\left(\lVert v-iw\rVert^2-\lVert v\rVert^2-\lVert w\rVert^2\right)$

Or we could combine both forms above to write

$\displaystyle\Re\left(\langle v,w\rangle\right)=\frac{1}{4}\left(\lVert v+w\rVert^2-\lVert v-w\rVert^2\right)$
$\displaystyle\Im\left(\langle v,w\rangle\right)=\frac{1}{4}\left(\lVert v-iw\rVert^2-\lVert v+iw\rVert^2\right)$

In all these ways we see that not only does an inner product on a real or complex vector space give us a norm, but the resulting norm completely determines the inner product. Different inner products necessarily give rise to different norms.

April 23, 2009 Posted by | Algebra, Linear Algebra | 5 Comments

## Complex Inner Products

Now consider a complex vector space. We can define bilinear forms, and even ask that they be symmetric and nondegenerate. But there’s no way for such a form to be positive-definite. Indeed, we saw that there isn’t even a notion of “order” on the field of complex numbers. They do contain the real numbers as a subfield, but we can’t manage to stay in the positive real numbers. Indeed, if we have $\langle v,v\rangle=a+0i$ for some real $a\geq0$, then we also have $\langle iv,iv\rangle=i^2(a+0i)=-a+0i$. So it seems we aren’t going to get the same geometric interpretations this way.

But let’s slow down and look at a one-dimensional complex vector space — the field of complex numbers itself. We do have a notion of length here. We define the length of a complex number $z=a+bi$ as the square root of $\bar{z}z=(a-bi)(a+bi)=a^2+b^2$. This quantity is always a positive real number, and thus always has a square root. And it looks sort of like how we compute the squared length of a vector with a bilinear form. Indeed, if we think of $\mathbb{C}$ as a real vector space with basis $\{1,i\}$, it’s exactly the norm we get when we define this basis to be orthonormal. The only thing weird is that conjugation.

Well, let’s run with this a while. Given a complex vector space $V$, we want a form $\langle\underline{\hphantom{X}},\underline{\hphantom{X}}\rangle$ which is

• linear in the second slot — $\langle u,av+bw\rangle=a\langle u,v\rangle+b\langle u,w\rangle$
• conjugate symmetric — $\langle v,w\rangle=\overline{\langle w,v\rangle}$

Conjugate symmetry implies that the form is conjugate linear in the first slot — $\langle av+bw,u\rangle=\bar{a}\langle v,u\rangle+\bar{b}\langle w,u\rangle$ — and also that $\langle v,v\rangle=\overline{\langle v,v\rangle}$ is always real. This makes it reasonable to also ask that the form be

• positive definite — $\langle v,v\rangle>0$ for all $v\neq0$

This mixture of being linear in one variable and “half-linear” in the other makes the whole form “one and a half” times linear, or “sesquilinear”.

Anyhow, now we do get a notion of length, defined by setting $\lVert v\rVert^2=\langle v,v\rangle$ as before. What about angle? That will depend directly on the Cauchy-Schwarz inequality, assuming it holds. We’ll check that now.

Our previous proof doesn’t really work, since our scalars are now complex, and we can’t argue that certain polynomials have no zeroes. But we can modify it. We start similarly, calculating

$\displaystyle0\leq\langle v-tw,v-tw\rangle=\langle v,v\rangle-t\langle v,w\rangle-\bar{t}\langle w,v\rangle+\bar{t}t\langle w,w\rangle$

Now the Cauchy-Schwarz inequality is trivial if $w=0$, so we may assume $\langle w,w\rangle\neq0$, and set $t=\frac{\langle w,v\rangle}{\langle w,w\rangle}$. Then we see

\displaystyle\begin{aligned}0&\leq\langle v,v\rangle-\frac{\langle w,v\rangle\langle v,w\rangle}{\langle w,w\rangle}-\frac{\langle v,w\rangle\langle w,v\rangle}{\langle w,w\rangle}+\frac{\langle w,v\rangle\langle v,w\rangle}{\langle w,w\rangle}\\&=\langle v,v\rangle-\frac{\lvert\langle v,w\rangle\rvert^2}{\langle w,w\rangle}\end{aligned}

Multiplying through by $\langle w,w\rangle$ and rearranging, we find

$\displaystyle\lvert\langle v,w\rangle\rvert^2\leq\langle v,v\rangle\langle w,w\rangle$

which is the complex version of the Cauchy-Schwarz inequality. And then just as in the real case we can write it as

$\displaystyle\frac{\lvert\langle v,w\rangle\rvert^2}{\lVert v\rVert^2\lVert w\rVert^2}\leq1$

which implies that

$\displaystyle-1\leq\frac{\langle v,w\rangle}{\lVert v\rVert\lVert w\rVert}\leq1$

which we can again interpret as the cosine of an angle.

So all the same notions of length and angle can be recovered from this sort of complex inner product.

April 22, 2009 Posted by | Algebra, Linear Algebra | 13 Comments

## Inner Products and Lengths

We’re still looking at a real vector space $V$ with an inner product. We used the Cauchy-Schwarz inequality to define a notion of angle between two vectors.

$\displaystyle\cos(\theta)=\frac{\lvert\langle v,w\rangle\rvert}{\langle v,v\rangle^{1/2}\langle w,w\rangle^{1/2}}$

Let’s take a closer look at those terms in the diagonal. What happens when we compute $\langle v,v\rangle$? Well, if we’ve got an orthonormal basis around and components $v^ie_i$, we can write

$\displaystyle\langle v,v\rangle=\sum\limits_{i=1}^d\left(v^i\right)^2$

The $v^i$ are distances we travel in each of the mutually-orthogonal directions given by the vectors $e_i$. But then this formula looks a lot like the Pythagorean theorem about calculating the square of the resulting distance. It may make sense to define this as the square of the length of $v$, and so the quantities in the denominator above were the lengths of $v$ and $w$, respectively.

Let’s be a little more formal. We want to define something called a “norm”, which is a notion of length on a vector space. If we think of a vector $v$ as an arrow pointing from the origin (the zero vector) to the point at its tip, we should think of the norm $\lVert v\rVert$ as the distance between these two points. Similarly, the distance between the tips of $v$ and $w$ should be the length of the displacement vector $v-w$ which points from one to the other. But a notion of distance is captured in the idea of a metric! So whatever a norm is, it should give rise to a metric by defining the distance $d(v,w)$ as the norm of $v-w$.

Here are some axioms: A function from $V$ to $\mathbb{R}$ is a norm, written $\lVert v\rVert$, if

• For all vectors $v$ and scalars $c$, we have $\lVert cv\rVert=\lvert c\rvert\lVert v\rVert$.
• For all vectors $v$ and $w$, we have $\lVert v+w\rVert\leq\lVert v\rVert+\lVert w\rVert$.
• The norm $\lVert v\rVert$ is zero if and only if the vector $v$ is the zero vector.

The first of these is eminently sensible, stating that multiplying a vector by a scalar should multiply the length of the vector by the size (absolute value) of the scalar. The second is essentially the triangle inequality in a different guise, and the third says that nonzero vectors have nonzero lengths.

Putting these axioms together we can work out

$\displaystyle0=\lVert0\rVert=\lVert v-v\rVert\leq\lVert v\rVert+\lVert -v\rVert=\lVert v\rVert+\lvert-1\rvert\lVert v\rVert=2\lVert v\rVert$

And thus every vector’s norm is nonnegative. From here it’s straightforward to check the conditions in the definition of a metric.

All this is well and good, but does an inner product give rise to a norm? Well, the third condition is direct from the definiteness of the inner product. For the first condition, let’s check

$\displaystyle\sqrt{\langle cv,cv\rangle}=\sqrt{c^2\langle v,v\rangle}=\sqrt{c^2}\sqrt{\langle v,v\rangle}=\lvert c\rvert\sqrt{\langle v,v\rangle}$

as we’d hope. Finally, let’s check the triangle inequality. We’ll start with

\displaystyle\begin{aligned}\lVert v+w\rVert^2&=\langle v+w,v+w\rangle\\&=\langle v,v\rangle+2\langle v,w\rangle+\langle w,w\rangle\\&\leq\lVert v\rVert^2+2\lvert\langle v,w\rangle\rvert+\lVert w\rVert^2\\&\leq\lVert v\rVert^2+2\lVert v\rVert\lVert w\rVert+\lVert w\rVert^2\\&=\left(\lVert v\rVert+\lVert w\rVert\right)^2\end{aligned}

where the second inequality uses the Cauchy-Schwarz inequality. Taking square roots (which preserves order) gives us the triangle inequality, and thus verifies that we do indeed get a norm, and a notion of length.

April 21, 2009

## Inner Products and Angles

We again consider a real vector space $V$ with an inner product. We’re going to use the Cauchy-Schwarz inequality to give geometric meaning to this structure.

First of all, we can rewrite the inequality as

$\displaystyle\frac{\langle v,w\rangle^2}{\langle v,v\rangle\langle w,w\rangle}\leq1$

Since the inner product is positive definite, we know that this quantity will be positive. And so we can take its square root to find

$\displaystyle-1\leq\frac{\lvert\langle v,w\rangle\rvert}{\langle v,v\rangle^{1/2}\langle w,w\rangle^{1/2}}\leq1$

This range is exactly that of the cosine function. Let’s consider the cosine restricted to the interval $\left[0,\pi\right]$, where it’s injective. Here we can define an inverse function, the “arccosine”. Using the geometric view on the cosine, the inverse takes a value between $-1$ and ${1}$ and considers the point with that $x$-coordinate on the upper half of the unit circle. The arccosine is then the angle made between the positive $x$-axis and the ray through this point, as a number between ${0}$ and $\pi$.

So let’s take this arccosine function and apply it to the value above. We define the angle $\theta$ between vectors $v$ and $w$ by

$\displaystyle\cos(\theta)=\frac{\lvert\langle v,w\rangle\rvert}{\langle v,v\rangle^{1/2}\langle w,w\rangle^{1/2}}$

Some immediate consequences show that this definition makes sense. First of all, what’s the angle between $v$ and itself? We find

$\displaystyle\cos(\theta)=\frac{\lvert\langle v,w\rangle\rvert}{\langle v,v\rangle^{1/2}\langle v,v\rangle^{1/2}}=1$

and so $\theta=0$. A vector makes no angle with itself. Secondly, what if we take two vectors from an orthonormal basis $\left\{e_i\right\}$? We calculate

$\displaystyle\cos(\theta_{ij})=\frac{\lvert\langle e_i,e_j\rangle\rvert}{\langle e_i,e_i\rangle^{1/2}\langle e_j,e_j\rangle^{1/2}}=\delta_{ij}$

If we pick the same vector twice, we already know we get $\theta_{ii}=0$, but if we pick two different vectors we find that $\cos(\theta_{ij})=0$, and thus $\theta_{ij}=\frac{\pi}{2}$. That is, two different vectors in an orthonormal basis are perpendicular, or “orthogonal”.

April 17, 2009

## The Cauchy-Schwarz Inequality

Today I want to present a deceptively simple fact about spaces equipped with inner products. The Cauchy-Schwarz inequality states that

$\displaystyle\langle v,w\rangle^2\leq\langle v,v\rangle\langle w,w\rangle$

for any vectors $v,w\in V$. The proof uses a neat little trick. We take a scalar $t$ and construct the vector $v+tw$. Now the positive-definiteness, bilinearity, and symmetry of the inner product tells us that

$\displaystyle0\leq\langle v+tw,v+tw\rangle=\langle v,v\rangle+2\langle v,w\rangle t+t^2\langle w,w\rangle$

This is a quadratic function of the real variable $t$. It can have at most one zero, if there is some value $t_0$ such that $v+t_0w$ is the zero vector, but it definitely can’t have two zeroes. That is, it’s either a perfect square or an irreducible quadratic. Thus we consider the discriminant and conclude

$\displaystyle\left(2\langle v,w\rangle\right)^2-4\langle w,w\rangle\langle v,v\rangle\leq0$

which is easily seen to be equivalent to the Cauchy-Schwarz inequality above. As a side effect, we see that we only get an equality (rather than an inequality) when $v$ and $w$ are linearly dependent.

April 16, 2009 Posted by | Algebra, Linear Algebra | 5 Comments

## Real Inner Products

Now that we’ve got bilinear forms, let’s focus in on when the base field is $\mathbb{R}$. We’ll also add the requirement that our bilinear forms be symmetric. As we saw, a bilinear form $B:V\otimes V\rightarrow\mathbb{R}$ corresponds to a linear transformation $B_1:V\rightarrow V^*$. Since $B$ is symmetric, the matrix of $B_1$ must itself be symmetric with respect to any basis. So let’s try to put it into a canonical form!

We know that we can put $B$ into the almost upper-triangular form

$\displaystyle\begin{pmatrix}A_1&&*\\&\ddots&\\{0}&&A_m\end{pmatrix}$

but now all the blocks above the diagonal must be zero, since they have to equal the blocks below the diagonal. On the diagonal, the $1\times1$ blocks are fine, but the $2\times2$ blocks must themselves be symmetric. That is, they must look like

$\displaystyle\begin{pmatrix}a&b\\b&d\end{pmatrix}$

which gives a characteristic polynomial of $X^2-(a+d)X+(ad-b^2)$ for the block. But recall that we could only use this block if there were no eigenvalues. And, indeed, we can check

\displaystyle\begin{aligned}\tau^2-4\delta&=(a+d)^2-4(ad-b^2)\\&=a^2+2ad+d^2-4ad+4b^2\\&=a^2-2ad+d^2+b^2\\&=(a-d)^2+b^2\geq0\end{aligned}

The discriminant is positive, and so this $2\times2$ block will break down into two $1\times1$ blocks. Thus any symmetric real matrix can be diagonalized, which means that any symmetric real bilinear form has a basis with respect to which its matrix is diagonal.

Let $\left\{e_i\right\}$ be such a basis. To be explicit, this means that $\langle e_i,e_j\rangle=b_i\delta_{ij}$, where the $b_i$ are real numbers and $\delta_{ij}$ is the Kronecker delta${1}$ if its indices match, and ${0}$ if they don’t. But we still have some freedom. If I multiply $e_i$ by a scalar $c$, we find $\langle ce_i,ce_i\rangle=c^2b_i$. We can always find some $c$ so that $c^2=\frac{1}{|b_i|}$, and so we can always pick our basis so that $b_i$ is ${1}$, $-1$, or ${0}$. We’ll call such a basis “orthonormal”.

The number of diagonal entries $b_i$ with each of these three values won’t depend on the orthonormal basis we choose. The form is nondegenerate if and only if there are no ${0}$ entries on the diagonal. If not, we can decompose $V$ as the direct sum of the subspace $\bar{V}$ on which the form is nondegenerate, and the remainder $W$ on which the form is completely degenerate. That is, $\langle w_1,w_2\rangle=0$ for all $w_1,w_2\in W$. We’ll only consider nondegenerate bilinear forms from here on out.

We write $p$ for the number of diagonal entries equal to ${1}$, and $q$ for the number equal to $-1$. Then the pair $(p,q)$ is called the signature of the form. Clearly for nondegenerate forms, $p+q=d$, the dimension of $V$. We’ll have reason to consider some different signatures in the future, but for now we’ll be mostly concerned with the signature $(d,0)$. In this case we call the form positive definite, since we can calculate

$\displaystyle\langle v,v\rangle=v^iv^j\langle e_i,e_j\rangle=v^iv^j\delta_{ij}=\sum\limits_{i=1}^d\left(v^i\right)^2$

The form is called “positive”, since this result is always nonnegative, and “definite”, since this result can only be zero if $v$ is the zero vector.

This is what we’ll call an inner product on a real vector space $V$ — a nondegenerate, positive definite, symmetric bilinear form $\langle\underbar{\hphantom{X}},\underbar{\hphantom{X}}\rangle:V\otimes V\rightarrow\mathbb{R}$. Notice that choosing such a form picks out a certain class of bases as orthonormal. Conversely, if we choose any basis $\left\{e_i\right\}$ at all we can create a form by insisting that this basis be orthonormal. Just define $\langle e_i,e_j\rangle=\delta_{ij}$ and extend by bilinearity.

April 15, 2009 Posted by | Algebra, Linear Algebra | 14 Comments

## Bilinear Forms

Now that we’ve said a lot about individual operators on vector spaces, I want to go back and consider some other sorts of structures we can put on the space itself. Foremost among these is the idea of a bilinear form. This is really nothing but a bilinear function to the base field: $B:V\times V\rightarrow\mathbb{F}$. Of course, this means that it’s equivalent to a linear function from the tensor square: $B:V\otimes V\rightarrow\mathbb{F}$.

Instead of writing this as a function, we will often use a slightly different notation. We write a bracket $B(v,w)=\langle v,w\rangle$, or sometimes $\langle v,w\rangle_B$, if we need to specify which of multiple different inner products under consideration.

Another viewpoint comes from recognizing that we’ve got a duality for vector spaces. This lets us rewrite our bilinear form $B:V\otimes V\rightarrow\mathbb{F}$ as a linear transformation $B_1:V\rightarrow V^*$. We can view this as saying that once we pick one of the vectors $x\in V$, the bilinear form reduces to a linear functional $\langle v,\underbar{\hphantom{X}}\rangle:V\rightarrow\mathbb{F}$, which is a vector in the dual space $V^*$. Or we could focus on the other slot and define $B_2(v)=\langle\underbar{\hphantom{X}},v\rangle\in V^*$.

We know that the dual space of a finite-dimensional vector space has the same dimension as the space itself, which raises the possibility that $B_1$ or $B_2$ is an isomorphism from $V$ to $V^*$. If either one is, then both are, and we say that the bilinear form $B$ is nondegenerate.

We can also note that there is a symmetry on the category of vector spaces. That is, we have a linear transformation $\tau_{V,V}:V\otimes V\rightarrow V\otimes V$ defined by $\tau_{V,V}(v\otimes w)=w\otimes v$. This makes it natural to ask what effect this has on our form. Two obvious possibilities are that $\tau_{V,V}\circ B=B$ and that $\tau_{V,V}\circ B=-B$. In the first case we’ll call the bilinear form “symmetric”, and in the second we’ll call it “antisymmetric”. In terms of the maps $B_1$ and $B_2$, we see that composing $B$ with the symmetry swaps the roles of these two functions. For symmetric bilinear forms, $B_1=B_2$, while for antisymmetric bilinear forms we have $B_1=-B_2$.

This leads us to consider nondegenerate bilinear forms a little more. If $B_2$ is an isomorphism it has an inverse $B_2^{-1}$. Then we can form the composite $B_2^{-1}\circ B_1:V\rightarrow V$. If $B$ is symmetric then this composition is the identity transformation on $V$. On the other hand, if $B$ is antisymmetric then this composition is the negative of the identity transformation. Thus, the composite transformation measures how much the bilinear transformation diverges from symmetry. Accordingly, we call it the asymmetry of the form $B$.

Finally, if we’re working over a finite-dimensional vector space we can pick a basis $\left\{e_i\right\}$ for $V$, and get a matrix for $B$. We define the matrix entry $B_{ij}=\langle e_i,e_j\rangle_B$. Then if we have vectors $v=v^ie_i$ and $w=w^je_j$ we can calculate

$\displaystyle\langle v,w\rangle=\langle v^ie_iw^je_j\rangle=v^iw^j\langle e_i,e_j\rangle=v^iw^jB_{ij}$

In terms of this basis and its dual basis $\left\{\epsilon^j\right\}$, we find the image of the linear transformation $B_1(v)=\langle v,\underbar{\hphantom{X}}\rangle=v^iB_{ij}\epsilon^j$. That is, the matrix also can be used to represent the partial maps $B_1$ and $B_2$. If $B$ is symmetric, then the matrix is symmetric $B_{ij}=B_{ji}$, while if it’s antisymmetric then $B_{ij}=-B_{ji}$.

April 14, 2009 Posted by | Algebra, Linear Algebra | 9 Comments