Now that we have the Gram-Schmidt process as a tool, we can use it to come up with orthonormal bases.
Any vector space with finite dimension has a finite basis . This is exactly what it means for to have dimension . And now we can apply the Gram-Schmidt process to turn this basis into an orthonormal basis .
We also know that any linearly independent set can be expanded to a basis. In fact, we can also extend any orthonormal collection of vectors to an orthonormal basis. Indeed, if is an orthonormal collection, we can add the vectors to fill out a basis. Then when we apply the Gram-Schmidt process to this basis it will start with , which is already normalized. It then moves on to , which is orthonormal with , and so on. Each of the is left unchanged, and the are modified to make them orthonormal with the existing collection.
Now that we have a real or complex inner product, we have notions of length and angle. This lets us define what it means for a collection of vectors to be “orthonormal”: each pair of distinct vectors is perpendicular, and each vector has unit length. In formulas, we say that the collection is orthonormal if . These can be useful things to have, but how do we get our hands on them?
It turns out that if we have a linearly independent collection of vectors then we can come up with an orthonormal collection spanning the same subspace of . Even better, we can pick it so that the first vectors span the same subspace as . The method goes back to Laplace and Cauchy, but gets its name from Jørgen Gram and Erhard Schmidt.
We proceed by induction on the number of vectors in the collection. If , then we simply set
This “normalizes” the vector to have unit length, but doesn’t change its direction. It spans the same one-dimensional subspace, and since it’s alone it forms an orthonormal collection.
Now, lets assume the procedure works for collections of size and start out with a linearly independent collection of vectors. First, we can orthonormalize the first vectors using our inductive hypothesis. This gives a collection which spans the same subspace as (and so on down, as noted above). But isn’t in the subspace spanned by the first vectors (or else the original collection wouldn’t have been linearly independent). So it points at least somewhat in a new direction.
To find this new direction, we define
This vector will be orthogonal to all the vectors from to , since for any such we can check
where we use the orthonormality of the collection to show that most of these inner products come out to be zero.
So we’ve got a vector orthogonal to all the ones we collected so far, but it might not have unit length. So we normalize it:
and we’re done.
There’s an interesting little identity that holds for norms — translation-invariant metrics on vector spaces over or — that come from inner products. Even more interestingly, it actually characterizes such norms.
Geometrically, if we have a parallelogram whose two sides from the same point are given by the vectors and , then we can construct the two diagonals and . It then turns out that the sum of the squares on all four sides is equal to the sum of the squares on the diagonals. We write this formally by saying
where we’ve used the fact that opposite sides of a parallelogram have the same length. Verifying this identity is straightforward, using the definition of the norm-squared:
On the other hand, what if we have a norm that satisfies this parallelogram law? Then we can use the polarization identities to define a unique inner product.
where we ignore the second term when working over real vector spaces.
However, if we have a norm that does not satisfy the parallelogram law and try to use it in these formulas, then the resulting form must fail to be an inner product. If we did get an inner product, then the norm would satisfy the parallelogram law, which it doesn’t.
Now, I haven’t given any examples of norms on vector spaces which don’t satisfy the parallelogram law, but they show up all the time in functional analysis. For now I just want to point out that such things do, in fact, exist.
Let’s take the sum of two vectors and . We can calculate its norm-squared as usual:
where denotes the real part of the complex number . If is already a real number, it does nothing.
So we can rewrite this equation as
If we’re working over a real vector space, this is the inner product itself. Over a complex vector space, this only gives us the real part of the inner product. But all is not lost! We can also work out
where denotes the imaginary part of the complex number . The last equality holds because
so we can write
We can also write these identities out in a couple other ways. If we started with , we could find the identities
Or we could combine both forms above to write
In all these ways we see that not only does an inner product on a real or complex vector space give us a norm, but the resulting norm completely determines the inner product. Different inner products necessarily give rise to different norms.
Now consider a complex vector space. We can define bilinear forms, and even ask that they be symmetric and nondegenerate. But there’s no way for such a form to be positive-definite. Indeed, we saw that there isn’t even a notion of “order” on the field of complex numbers. They do contain the real numbers as a subfield, but we can’t manage to stay in the positive real numbers. Indeed, if we have for some real , then we also have . So it seems we aren’t going to get the same geometric interpretations this way.
But let’s slow down and look at a one-dimensional complex vector space — the field of complex numbers itself. We do have a notion of length here. We define the length of a complex number as the square root of . This quantity is always a positive real number, and thus always has a square root. And it looks sort of like how we compute the squared length of a vector with a bilinear form. Indeed, if we think of as a real vector space with basis , it’s exactly the norm we get when we define this basis to be orthonormal. The only thing weird is that conjugation.
Well, let’s run with this a while. Given a complex vector space , we want a form which is
- linear in the second slot —
- conjugate symmetric —
Conjugate symmetry implies that the form is conjugate linear in the first slot — — and also that is always real. This makes it reasonable to also ask that the form be
- positive definite — for all
This mixture of being linear in one variable and “half-linear” in the other makes the whole form “one and a half” times linear, or “sesquilinear”.
Our previous proof doesn’t really work, since our scalars are now complex, and we can’t argue that certain polynomials have no zeroes. But we can modify it. We start similarly, calculating
Now the Cauchy-Schwarz inequality is trivial if , so we may assume , and set . Then we see
Multiplying through by and rearranging, we find
which is the complex version of the Cauchy-Schwarz inequality. And then just as in the real case we can write it as
which implies that
which we can again interpret as the cosine of an angle.
So all the same notions of length and angle can be recovered from this sort of complex inner product.
Let’s take a closer look at those terms in the diagonal. What happens when we compute ? Well, if we’ve got an orthonormal basis around and components , we can write
The are distances we travel in each of the mutually-orthogonal directions given by the vectors . But then this formula looks a lot like the Pythagorean theorem about calculating the square of the resulting distance. It may make sense to define this as the square of the length of , and so the quantities in the denominator above were the lengths of and , respectively.
Let’s be a little more formal. We want to define something called a “norm”, which is a notion of length on a vector space. If we think of a vector as an arrow pointing from the origin (the zero vector) to the point at its tip, we should think of the norm as the distance between these two points. Similarly, the distance between the tips of and should be the length of the displacement vector which points from one to the other. But a notion of distance is captured in the idea of a metric! So whatever a norm is, it should give rise to a metric by defining the distance as the norm of .
Here are some axioms: A function from to is a norm, written , if
- For all vectors and scalars , we have .
- For all vectors and , we have .
- The norm is zero if and only if the vector is the zero vector.
The first of these is eminently sensible, stating that multiplying a vector by a scalar should multiply the length of the vector by the size (absolute value) of the scalar. The second is essentially the triangle inequality in a different guise, and the third says that nonzero vectors have nonzero lengths.
Putting these axioms together we can work out
And thus every vector’s norm is nonnegative. From here it’s straightforward to check the conditions in the definition of a metric.
All this is well and good, but does an inner product give rise to a norm? Well, the third condition is direct from the definiteness of the inner product. For the first condition, let’s check
as we’d hope. Finally, let’s check the triangle inequality. We’ll start with
where the second inequality uses the Cauchy-Schwarz inequality. Taking square roots (which preserves order) gives us the triangle inequality, and thus verifies that we do indeed get a norm, and a notion of length.
First of all, we can rewrite the inequality as
Since the inner product is positive definite, we know that this quantity will be positive. And so we can take its square root to find
This range is exactly that of the cosine function. Let’s consider the cosine restricted to the interval , where it’s injective. Here we can define an inverse function, the “arccosine”. Using the geometric view on the cosine, the inverse takes a value between and and considers the point with that -coordinate on the upper half of the unit circle. The arccosine is then the angle made between the positive -axis and the ray through this point, as a number between and .
So let’s take this arccosine function and apply it to the value above. We define the angle between vectors and by
Some immediate consequences show that this definition makes sense. First of all, what’s the angle between and itself? We find
and so . A vector makes no angle with itself. Secondly, what if we take two vectors from an orthonormal basis ? We calculate
If we pick the same vector twice, we already know we get , but if we pick two different vectors we find that , and thus . That is, two different vectors in an orthonormal basis are perpendicular, or “orthogonal”.
Today I want to present a deceptively simple fact about spaces equipped with inner products. The Cauchy-Schwarz inequality states that
for any vectors . The proof uses a neat little trick. We take a scalar and construct the vector . Now the positive-definiteness, bilinearity, and symmetry of the inner product tells us that
This is a quadratic function of the real variable . It can have at most one zero, if there is some value such that is the zero vector, but it definitely can’t have two zeroes. That is, it’s either a perfect square or an irreducible quadratic. Thus we consider the discriminant and conclude
which is easily seen to be equivalent to the Cauchy-Schwarz inequality above. As a side effect, we see that we only get an equality (rather than an inequality) when and are linearly dependent.
Now that we’ve got bilinear forms, let’s focus in on when the base field is . We’ll also add the requirement that our bilinear forms be symmetric. As we saw, a bilinear form corresponds to a linear transformation . Since is symmetric, the matrix of must itself be symmetric with respect to any basis. So let’s try to put it into a canonical form!
We know that we can put into the almost upper-triangular form
but now all the blocks above the diagonal must be zero, since they have to equal the blocks below the diagonal. On the diagonal, the blocks are fine, but the blocks must themselves be symmetric. That is, they must look like
which gives a characteristic polynomial of for the block. But recall that we could only use this block if there were no eigenvalues. And, indeed, we can check
The discriminant is positive, and so this block will break down into two blocks. Thus any symmetric real matrix can be diagonalized, which means that any symmetric real bilinear form has a basis with respect to which its matrix is diagonal.
Let be such a basis. To be explicit, this means that , where the are real numbers and is the Kronecker delta — if its indices match, and if they don’t. But we still have some freedom. If I multiply by a scalar , we find . We can always find some so that , and so we can always pick our basis so that is , , or . We’ll call such a basis “orthonormal”.
The number of diagonal entries with each of these three values won’t depend on the orthonormal basis we choose. The form is nondegenerate if and only if there are no entries on the diagonal. If not, we can decompose as the direct sum of the subspace on which the form is nondegenerate, and the remainder on which the form is completely degenerate. That is, for all . We’ll only consider nondegenerate bilinear forms from here on out.
We write for the number of diagonal entries equal to , and for the number equal to . Then the pair is called the signature of the form. Clearly for nondegenerate forms, , the dimension of . We’ll have reason to consider some different signatures in the future, but for now we’ll be mostly concerned with the signature . In this case we call the form positive definite, since we can calculate
The form is called “positive”, since this result is always nonnegative, and “definite”, since this result can only be zero if is the zero vector.
This is what we’ll call an inner product on a real vector space — a nondegenerate, positive definite, symmetric bilinear form . Notice that choosing such a form picks out a certain class of bases as orthonormal. Conversely, if we choose any basis at all we can create a form by insisting that this basis be orthonormal. Just define and extend by bilinearity.
Now that we’ve said a lot about individual operators on vector spaces, I want to go back and consider some other sorts of structures we can put on the space itself. Foremost among these is the idea of a bilinear form. This is really nothing but a bilinear function to the base field: . Of course, this means that it’s equivalent to a linear function from the tensor square: .
Instead of writing this as a function, we will often use a slightly different notation. We write a bracket , or sometimes , if we need to specify which of multiple different inner products under consideration.
Another viewpoint comes from recognizing that we’ve got a duality for vector spaces. This lets us rewrite our bilinear form as a linear transformation . We can view this as saying that once we pick one of the vectors , the bilinear form reduces to a linear functional , which is a vector in the dual space . Or we could focus on the other slot and define .
We know that the dual space of a finite-dimensional vector space has the same dimension as the space itself, which raises the possibility that or is an isomorphism from to . If either one is, then both are, and we say that the bilinear form is nondegenerate.
We can also note that there is a symmetry on the category of vector spaces. That is, we have a linear transformation defined by . This makes it natural to ask what effect this has on our form. Two obvious possibilities are that and that . In the first case we’ll call the bilinear form “symmetric”, and in the second we’ll call it “antisymmetric”. In terms of the maps and , we see that composing with the symmetry swaps the roles of these two functions. For symmetric bilinear forms, , while for antisymmetric bilinear forms we have .
This leads us to consider nondegenerate bilinear forms a little more. If is an isomorphism it has an inverse . Then we can form the composite . If is symmetric then this composition is the identity transformation on . On the other hand, if is antisymmetric then this composition is the negative of the identity transformation. Thus, the composite transformation measures how much the bilinear transformation diverges from symmetry. Accordingly, we call it the asymmetry of the form .
Finally, if we’re working over a finite-dimensional vector space we can pick a basis for , and get a matrix for . We define the matrix entry . Then if we have vectors and we can calculate
In terms of this basis and its dual basis , we find the image of the linear transformation . That is, the matrix also can be used to represent the partial maps and . If is symmetric, then the matrix is symmetric , while if it’s antisymmetric then .