## The Gram-Schmidt Process

Now that we have a real or complex inner product, we have notions of length and angle. This lets us define what it means for a collection of vectors to be “orthonormal”: each pair of distinct vectors is perpendicular, and each vector has unit length. In formulas, we say that the collection is orthonormal if . These can be useful things to have, but how do we get our hands on them?

It turns out that if we have a linearly independent collection of vectors then we can come up with an orthonormal collection spanning the same subspace of . Even better, we can pick it so that the first vectors span the same subspace as . The method goes back to Laplace and Cauchy, but gets its name from Jørgen Gram and Erhard Schmidt.

We proceed by induction on the number of vectors in the collection. If , then we simply set

This “normalizes” the vector to have unit length, but doesn’t change its direction. It spans the same one-dimensional subspace, and since it’s alone it forms an orthonormal collection.

Now, lets assume the procedure works for collections of size and start out with a linearly independent collection of vectors. First, we can orthonormalize the first vectors using our inductive hypothesis. This gives a collection which spans the same subspace as (and so on down, as noted above). But isn’t in the subspace spanned by the first vectors (or else the original collection wouldn’t have been linearly independent). So it points at least somewhat in a new direction.

To find this new direction, we define

This vector will be orthogonal to all the vectors from to , since for any such we can check

where we use the orthonormality of the collection to show that most of these inner products come out to be zero.

So we’ve got a vector orthogonal to all the ones we collected so far, but it might not have unit length. So we normalize it:

and we’re done.

Are there any applications of Gram-Schmidt to e.g. polynomials or Hilbert spaces of L^2 summable analytic functions on a domain?

Comment by Zygmund | April 28, 2009 |

To finite-dimensional subspaces, sure, but the procedure is essentially finite.

Comment by John Armstrong | April 28, 2009 |

I don’t think that’s quite true. If you start with an infinite sequence (v_n), then you can certainly keep applying the Gram-Schmidt process.: at level n, you only need to know v_1 up to v_n, so it is “finite” in this sense. Essentially because of the (abstract form) of Parseval’s Theorem the infinite series you end up with is still nicely related to what you started with (same closed linear span, for example).

Lots of classical polynomails can be found in this case, if you start with [a,b] and some inner-product on the continuous functions of [a,b] (normally not just integration). For example, the Hermite polynomials, the Legendre polynomials etc.

Comment by Matt Daws | April 29, 2009 |

Well… the answer is “yes”. If H is a separable (pre)Hilbert space, and if you have a sequence x_n whose span is dense in H, then surely you can apply Gram-Schmidt to produce an orthonormal basis of H, in other words a complete/maximal orthonormal set. That would apply in particular to L^2 functions on a domain.

When applied to polynomials on [-1, 1], starting with basis x^n, this gives essentially the Legendre polynomials.

Comment by Todd Trimble | April 29, 2009 |

Sure, Matt, but at any point you’ve got a finite subspace. My point (and this also goes to Todd’s comment) is that you can apply Gram-Schmidt as far down the infinite sequence as you’d like, but you’ll never get them

alldone.Comment by John Armstrong | April 29, 2009 |

What?! It’s simple induction!

Comment by Todd Trimble | April 29, 2009 |

Induction tells you that for any

finiteyou can orthonormalize the first vectors, but not that you can orthonormalize an entire infinite sequence.Comment by John Armstrong | April 29, 2009 |

[…] Now that we have the Gram-Schmidt process as a tool, we can use it to come up with orthonormal […]

Pingback by Orthonormal Bases « The Unapologetic Mathematician | April 29, 2009 |

Sigh. Actually, technically this extremely commonplace type of construction is correctly called

recursive, but most non-logicians just call it a construction by induction. (One constructs things by recursion, but proves things by induction.)I’m a little surprised that you’re debating this, but let me try to keep this light and humorous. Please imagine me saying, in the friendliest way possible: I’m gonna have to get formal on yo’ ass!

I think you’re probably familiar with Lawvere’s universal characterization of the natural numbers. It says that given a set X, a point p_0: 1 –> X, and a function f: X –> X, there is a unique function h: N –> X such that h(0) = x_0 and h(n+1) = f(h(n)). Let’s take this as given.

Now on to Gram-Schmidt. Suppose given an inner product space H and a sequence x_n in H, which we’ll construe as an element of the function space H^N. Just for the sake of convenience, let me assume x_0 is a unit vector.

Put X = N x H^N. Then define f: X –> X by the rule

f(n; y_0, y_1, …) =

(n+1; z_m :=

N(y_m – Sum_{j < n+1} (y_m, y_j)y_j) )

where N( ) denotes normalization v |–> v/|v|, and I’ve used ( , ) inside the sum for inner product.

Then let p_0 in N x H^N be the point (0; x_0, x_1, …), where again x_j is the original sequence.

So, by the universal property, we get a function

N –> N x H^N.

Follow this by projection N x H^N –> H^N. We get a function

N –> H^N

which we can construe as a double sequence

N x N –> H.

Take the diagonal of this double sequence, i.e., compose with the diagonal map N –> N x N. We get a map N –> H.

Unless I’ve made an (easily correctable) mistake, this map is the orthonormal Gram-Schmidt sequence we’re after.

Does this help clear it up?

Comment by Todd Trimble | April 29, 2009 |

Todd, I understand your point, but I would say that the universal construction (which I used to define the natural numbers) shows that the result of applying Gram-Schmidt increasingly many times gives a unique sequence of vectors, but it doesn’t

constructsuch a sequence. This is the universal failing of universal properties.Gram-Schmidt constructs the first terms of the orthonormal basis for any finite , but it can’t get all of them at once. A high-level view (such as invoking the universal property) can show that they all exist, but doesn’t construct them all. Other views, tailored to specific circumstances, can give a closed form for the th vector, and we can check that the results satisfy orthonormality, but Gram-Schmidt only hints at the pattern, or it can be used to check the inductive step of one possible avenue of checking.

The point that I’m making is that Gram-Schmidt is a

processand aconstruction, not a mere theorem.Comment by John Armstrong | April 29, 2009 |

John, I have absolutely no idea what you’re talking about. I gave a rigorous explicit recursive

constructionof an orthonormal set in terms of a given sequence, which spans the same subspace. It’s an explicit construction, not just an existence statement. I can’t see a thing wrong with it.You surely agree (unless you’re a finitist and I’m learning this for the first time?!) that the recursion property of the natural numbers provides an absolutely standard typical way of constructing sequences. That’s how universal properties are used: to construct maps. For example, if you accept this universal characterization of N, as I thought you did, then you surely understand how it can be used to construct maps such as addition on N, etc., etc. The construction I gave above is no different in underlying principle.

In other words, reading over your first sentence [where you actually use the word “construction”, and then say it doesn’t construct]: are you saying you

disagreethat the universal property of N can be used to _construct_ addition on N — that all we get from it is a “hint of the pattern”? Are you disavowing general constructions of this type?I’d be amazed and dumbstruck if you did! At any rate, whatever the objection may be, it sounds more philosophical than mathematical (and not in keeping with the realist that I thought you were).

(Does anybody reading this understand what John is trying to say?)

Comment by Todd Trimble | April 29, 2009 |

It’s not that I’m a finitist. It’s more that I’m begin very careful about the distinction between “for any finite ” and “for all finite [at once]”. Yes, finitists reject one side of this distinction, but just because I don’t follow them that far doesn’t mean I can’t notice the distinction. I also accept the axiom of choice, while still noting where I use it.

The universal property you invoke just says that a certain function

exists. It doesn’t “construct” the function any more than the universal property of products in “constructs” functions to the product set. No, the construction in that case comes from an explicitrealizationof the categorical product as the Cartesian product. Different models will give different — uniquely isomorphic — functions.I’m not even sure that your construction works, modulo that invocation. The vector you normalize is

which I’m not certain is actually orthogonal to the earlier vectors. Maybe this comes out in the wash when you invoke the universal property, but it’s not clear to me.

Anyway, I don’t see what you’re getting so bent out of shape about. In practice you usually only need a finite-dimensional subspace in an application that calls for an explicit basis anyway. And there are

other waysto get your hands on bases and show that they’re orthonormal. I’m never denying the validity of any of your constructions. I’m just saying that they take the willingness to jump from “for any finite ” to “for all finite at once”, and that it’s worthwhile to call explicit attention that jump and not make it unless there’s no other way around.Comment by John Armstrong | April 29, 2009 |

If you’re not denying the validity of the constructions, then you’re not denying the validity of the statements made in comments 3 and 4, the way I see it. So then I don’t see why you felt a need to argue against them.

But you avoided my question: do you see addition of natural numbers as something globally defined by a recursive construction — “all at once” as you put it — or do you see it otherwise: “for any given pair (m, n), one can ‘construct’ m + n, but one can never define all such additions at once”. Because if you accept that global construction, then I can’t understand why you wouldn’t accept more elaborate applications of the same underlying recursive principle. That was the basic point of my comment 11.

I do grasp this distinction you’re making, but not why are you particularly drawing attention to it. For this type of thing may have been a burning issue in debates over potential and actual infinities more than a century ago, but it doesn’t really impinge on the way most mathematicians (except e.g. finitists) carry out discussions these days. In particular, I don’t understand why it should make any substantial difference in how one should reply to comment 1.

In other words, comments 3 and 4 stand and are completely valid, according to standard mathematics as practiced today. Case closed.I don’t see much substance in your paragraph 2. As you know, I have a stylistic preference for foundational systems like ETCS which stress universal properties (which by the way don’t assert mere existence, but unique existence of suitable maps). Such systems are in no way dependent on other systems like ZF, and in particular do not depend on “explicit realizations” (as you put it) of objects like cartesian products.

They are axiomatic theories, no more and no less than ZF, and are (for the purposes of discussing Gram-Schmidt, for example) equally as good. To repeat: they are neither more nor less “explicit” or “constructive” (in the way you are using the word) than ZF. (In other senses, one could say that certain topos-oriented forms of categorical set theory are *more* constructive, in that the axiom of choice is not assumed, and so on. See, I also pay attention to such issues.)Notice that in either case, when we read aloud the axioms of some set theory, we usually say phrases like “there exists”. It is customary not to read the same symbol as “we can construct” — even in the most constructive versions of set theory. But so what? In point of fact, we never in any literal sense ‘construct’ mathematical objects, and we don’t [or I don’t] positively assert that objects like N truly ‘exist’! All that’s philosophy, not mathematics.

But in ordinary language use, when it comes to applying statements of universal properties, which involveuniquedetermination of maps, it’s completely okay to say things like “using the universal property ___, we may construct… ” — no one ever bats an eye. I’m sure if I comb through UM, I could spot you saying the same type of thing. So I can’timaginewhy you’re quibbling over that of all things — all my constructions were completely constructive in the best mathematical sense of that word, and can be made perfectly rigorous in a perfectly acceptable ‘foundational system’ if you like, and that’s that. This is just completely standard stuff!And despite the occasional exclamation mark, I’m not getting bent out of shape. But you seemed to be arguing against the validity of 3 and 4, and for the sake of poor Zygmund in comment 1, I thought we should suss it out a bit. You say my constructions are valid, and I read that as your assenting to comments 3 and 4. Or are you actually still seriously debating this?

I’ll let you have the last word, if you wish. I’ve spent more time on what I think is a really silly discussion than I should have.

Comment by Todd Trimble | April 30, 2009 |

I still don’t feel entirely comfortable saying that it’s Gram-Schmidt doing the heavy lifting in comments 3 and 4. There are other ways around it, and to the extent Gram-Schmidt is useful we’re only actually using a finite-dimensional subspace anyhow. In fact, that’s exactly what Matt said, and what I agreed with there.

I really don’t think that there’s ever been as much difference here as you seem to think. And, by the way, statements that amount to “are you seriously questioning me” go a lot further than exclamation points in giving the impression you’re upset about it. If nothing else it’s not exactly a light touch.

Comment by John Armstrong | April 30, 2009 |

Okay. I said you could have the last word, but I’ll agree with you that I got a little heavy there. Shake?

Comment by Todd Trimble | April 30, 2009 |

Sure.

Comment by John Armstrong | April 30, 2009 |

Compare Paul Garrett’s notes

http://www.math.umn.edu/~garrett/m/fun/Notes/02_hsp.pdf

on Hilbert Space. Section 13: the Gram-Schmidt process.

(I don’t understand either why John Armstrong doesn’t want to call this the Gram-Schmidt process.)

Comment by Pete L. Clark | May 3, 2009 |

Well, for one thing Garrett is using a different definition of “basis”, where finite linear combinations are only required to be dense in the space. But I suppose little details like definitions don’t matter in mathematics, right? We can be as sloppy as engineers and everything still works the same.

I’ve already explained my position repeatedly, and Todd understands it even if he violently disagrees with the philosophy I’m acknowledging the existence of. If all you have is “me-too”ism, don’t bother. If you really want to understand, try reading.

Comment by John Armstrong | May 3, 2009 |

“Well, for one thing Garrett is using a different definition of “basis”, where finite linear combinations are only required to be dense in the space.”

Right, this is one of the two kinds of basis which is used in the study of topological vector spaces. The term “Hamel basis” is used for a basis in the sense of linear algebra. Usually though in the study of Hilbert spaces the former notion of basis is more useful. I brought up Hilbert space because Zygmund asked about it.

“But I suppose little details like definitions don’t matter in mathematics, right? We can be as sloppy as engineers and everything still works the same.”

I didn’t say this, nor do I agree with it (nor is it fair to engineers!).

I am surprised and disappointed that my brief comment engendered such a negative reaction. I am sorry to say that it may have the effect of discouraging me from posting here in the future.

Comment by P.L. Clark | May 3, 2009 |

Just to be clear, my beef wasn’t actually against any philosophy per se. I was disagreeing with the mathematical content of comments 5 and 7 according to standard mathematical practice (i.e., valid constructions and arguments according to an accepted axiomatic system like ZF or ETCS). That is, I was asserting you

can“get them all done” (cf. comment 5), and I was disagreeing with the asserted limitation on induction or recursion expressed in comment 7.I thought comment 9 had covered the underlying principle at stake, although I acknowledge a mistake in my “Gram-Schmidt formula”: I should have expressed the f: N x H^N –> N x H^N as

f(n; y_0, y_1, …) = (n+1, z_m := y_m if m != n+1,

z_m = N(y_m – Sum_{j < n+1} (y_m, y_j)y_j) if m = n+1)

I was trying to be too slick before and avoid a casewise definition. I’m sorry for the error. (I also had a typo, where I wrote an x_0 where I should have had a p_0).

I didn’t understand why you continued to argue back against this in comment 10 (where you wrote, “Gram-Schmidt constructs the first n terms of the orthonormal basis for any finite n, but it can’t get all of them at once”), which simply repeats what you said before, and which I thought was already effectively refuted in comment 9. For your continuing to argue that point only made sense to me if you were in effect

denyingthe recursive property of N (which I formulated in comment 9 a la Lawvere), a prospect which I obviously considered highly surprising. For it would fly in the face of valid constructions and arguments according to standardly agreed-upon axiomatic foundations (e.g. ZF or ETCS) of working mathematicians — the kind of mathematics you’ve been expounding all along.In other words, it seemed to me your comments 5, 7, and 10 could be defended only if you were

actively rejectingsuch standardly agreed-upon axiomatics (as a finitist might). If youweren’trejecting it, then those comments were simply wrong on mathematical grounds. That was the core thing I was arguing.John, I sincerely apologize for getting overly aggressive, but I just couldn’t understand why you were arguing in the way you were about the limitations of induction or recursion, especially in light of comment 9, which I thought should have settled the matter definitively. I was nonplussed and frustrated that you weren’t owning up to a simple mathematical mistake [we all make them], but just arguing back, and on things (like existence and construction) which IMHO were not pertinent to the core of the discussion.

But I shouldn’t have lost my cool, and again I’m sorry about that. If there’s some mathematical point that I didn’t make clear, I’ll be happy to try again.

Comment by Todd Trimble | May 3, 2009 |

Todd, I don’t think that in practice there’s any limitation beyond what you claim can be done. In practice, almost all applications I see only ever need a sufficiently large finite-dimensional subspace. This is why

in practicethere’s only a semantic difference between “for any finite ” and “for all finite [at once]”, and it’s a subtlety that there’s almost no harm in sweeping aside.But just like one day you may find yourself working in a topos that rejects the Axiom of Choice, you might find it useful to accept finitist restrictions for a moment.

Comment by John Armstrong | May 3, 2009 |

John, here’s what you said:

This is not a statement about what one needs in practice or whatnot. It’s a blunt positive declaration that flies in the face of the standard mathematics you’ve been expounding thus far.

It is true only to the extent that youWhich you haven’t. Not even now.declareyourself to be working in some nonstandard (e.g., finitist) context.I’m happy to work in whatever axiomatic framework seems useful for my purposes. You seem to believe that I have a bias against finitism. I don’t, but anyway that’s not the point. You’ve been expounding ordinary standard mathematics all along, and therefore that’s the context in which Zygmund’s question should have been addressed.

Comment by Todd Trimble | May 3, 2009 |

I’m usually making existence statements, not giving constructive processes. And I fully recognize the validity of using a constructive process, extrapolating a pattern, and then proving that the result of that extrapolation satisfies the desired property. I just say that that constitutes an extra step beyond the constructive process itself. You disagree (still rather sneeringly, it would seem), and that’s your prerogative.

Comment by John Armstrong | May 3, 2009 |

Do you still stand behind this other positive declaration?

Because aside from whatever semantic limitations you are imposing on the word “process”, this statement above sums up what in your line of argument seems obviously mistaken to me. It is this statement which prompted my comment 9.

Comment by Todd Trimble | May 3, 2009 |

I’ll state explicitly the clause “at once” before I agree. The constructive process is, as I’ve put it before, inherently finite. It can orthonormalize as large a finite number of the vectors in the sequence as you’d like, but by means of the process alone you will never reach the end of the sequence and have it all done.

Comment by John Armstrong | May 3, 2009 |

Read again carefully the exact quote, where neither the non-mathematical phrase “at once” appears, nor does the phrase “constructive process” (which according to your pronouncement can’t be finite, although others may choose not to accept that semantic limitation), and please tell me if you still stand by it. It’s a statement about what

induction(or perhaps better, recursion) can or cannot do.Comment by Todd Trimble | May 3, 2009 |

As written, I can see it being interpreted with neither of those clauses in mind. However, with those elisions replaced that has

alwaysbeen the position I’ve been holding. Gram-Schmidt on its own is a constructive process and cannot finish the infinite sequence.Comment by John Armstrong | May 4, 2009 |

I meant “can’t be infinite” in the second line.

Comment by Todd Trimble | May 4, 2009 |

Well, okay. I guess you understand clearly that contra comment 7, induction/recursion can do what I said it could do in comment 9, and that you can understand why comment 9 was a justifiable reaction to comment 7. To wit, that there

isa simple recursive construction using a formula based on [ahem] theideaof the Gram-Schmidt prescription, which produces an orthonormal sequence given an initial linearly independent sequence. This isn’t philosophy, and this isn’t semantics about terms like “process”. It’s straight mathematics.Naturally you are free to declare, if you really want, that “the Gram-Schmidt process” must by definition be finitary — this is just semantics over the term “process” (which AFAIK is not a term with a standard widely-accepted definition). It seems to me a little doctrinaire and self-limiting, and not in the spirit of recognizing that the

ideaof Gram-Schmidt can be applied to infinitary contexts as well (speaking back to comments 1 and 2), but hey, it’s your blog and your prerogative.Comment by Todd Trimble | May 4, 2009 |

I haven’t managed to grasp every twist and turn of this argument, but here’s an issue I’m having with the idea that Gram-Schmidt ‘gets done’. If we start with the linearly independent set , we can successively orthonormalize so as to get a sequence of increasingly inclusive subspaces with orthonormal bases, whose union will exist and be a subspace of the original that has an orthonormal basis. If I could prove that any vector in the original space appeared in this subspace, then I’d be entirely happy with the claim that Gram-Schmidt had been completed, but I can’t, so I’m not.

Comment by Avery Andrews | May 4, 2009 |

I mean for the non-parsing formula

Comment by Avery Andrews | May 4, 2009 |

Avery, the span of the set {v_i} equals the union of the spans of {v_i: i ≤ n}. When we apply Gram-Schmidt to the sequence to produce a new sequence {u_i}, we have

span {v_i: i ≤ n} = span {u_i: i ≤ n}

and so the union of the spans on the right equals the union of the spans on the left, which equals the span of the original set. Or was that what you were asking?

See also Garrett’s notes referred to in comment 17.

Comment by Todd Trimble | May 5, 2009 |

I don’t understand much of the Garrett notes (deficiency of time and background to crank thru details) but from the last paragraph on pg 10 it looks to me like the proof is inherently nonconstructive (not [union of chain = the whole space] => contradiction]; therefore [union of chain = whole space]), so if there is no constructive proof that the union of the chain is the whole space, could that perhaps justify John’s intuition here? Or am I way out in left field, which is sometimes the case :)

Comment by Avery Andrews | May 5, 2009 |

I think it’s much easier than that: every vector in the space spanned by the set {v_i} is by definition a

finitelinear combination, i.e., of the forma_0 v_0 + … + a_n v_n

for some choice of n and a_0, …, a_n, and therefore in Span(v_0, …, v_n), one of the finite-dimensional spaces in the collection we are taking the union of.

The word “choice” here is superfluous, since n is uniquely determined if we take it minimal, and then the a_0, …, a_n are uniquely determined by linear independence.

Comment by Todd Trimble | May 5, 2009 |

Just to add a little more to that: Garrett on page 10 is discussing different issues from what we’ve been discussing above. For Gram-Schmidt, look at his section 13 instead. To tie it together with my previous comment, you can assume the v_i are a countable linearly independent set whose span is dense in the given Hilbert space H. (By “dense”, I guess you know that we mean that we also think of H as a metric space where the metric is defined via the inner product, hence H carries a topology, and “dense set” carries the usual topological meaning: its closure is all of H. In other words, every element of H is a limit of a sequence of finite linear combinations of the v_i which is a Cauchy sequence with respect to the metric.)

Comment by Todd Trimble | May 5, 2009 |

[…] an orthonormal basis which also gives us an upper-triangular matrix. And of course, we’ll use Gram-Schmidt to do […]

Pingback by Upper-Triangular Matrices and Orthonormal Bases « The Unapologetic Mathematician | May 8, 2009 |

[…] this by picking a basis of and declaring it to be orthonormal. We don’t anything fancy like Gram-Schmidt, which is used to find orthonormal bases for a given inner product. No, we just define our inner […]

Pingback by Maschke’s Theorem « The Unapologetic Mathematician | September 28, 2010 |