# The Unapologetic Mathematician

## Polar Decomposition

Okay, let’s take the singular value decomposition and do something really neat with it. Specifically, we’ll start with an endomorphism $T:A\rightarrow A$ and we’ll write down its singular value decomposition

$\displaystyle T=V\Sigma W^*$

where $V$ and $W$ are in the unitary (orthogonal) group of $A$. So, as it turns out, $U=VW^*$ is also unitary. And $P=W\Sigma W^*$ is positive-semidefinite (since $\Sigma$ is). And, of course, since $W^*W=1_A$, we can write

$\displaystyle T=V\Sigma W^*=VW^*W\Sigma W^*=UP$

That is, any endomorphism can be written as the product of a unitary transformation and a positive-semidefinite one.

Remember that unitary transformations are like unit complex numbers, while positive-semidefinite transformations are like nonnegative real numbers. And so this “polar decomposition” is like the polar form of a complex number, where we write a complex number as the product of a unit complex number and a nonnegative real number.

We can recover the analogy like we did before, by taking determinants. We find

$\displaystyle\det(T)=\det(U)\det(P)=e^{i\theta}r$

since the determinant of a unitary transformation is a unit complex number, and the determinant of a positive-semidefinite transformation is a nonnegative real number. If $T$ is nonsingular, so $P$ is actually positive-definite, then $\det(P)$ will be strictly positive, so the determinant of $T$ will be nonzero.

We could also define $P'=W\Sigma W^*$, so $T=P'U$. This is the left polar decomposition (writing the positive-definite part on the left), where the previous form is the right polar decomposition

August 19, 2009 Posted by | Algebra, Linear Algebra | 7 Comments

## The Meaning of the SVD

We spent a lot of time yesterday working out how to write down the singular value decomposition of a transformation $M:A\rightarrow B$, writing

$\displaystyle M=U\Sigma V^*$

where $U$ and $V$ are unitary transformations on $B$ and $A$, respectively, and $\Sigma:A\rightarrow B$ is a “diagonal” transformation, in the sense that its matrix looks like

$\displaystyle\Sigma=\begin{pmatrix}D&0\\{0}&0\end{pmatrix}$

where $D$ really is a nonsingular diagonal matrix.

So what’s it good for?

Well, it’s a very concrete representation of the first isomorphism theorem. Every transformation is decomposed into a projection, an isomorphism, and an inclusion. But here the projection and the inclusion are built up into unitary transformations (as we verified is always possible), and the isomorphism is the $D$ part of $\Sigma$.

Incidentally, this means that we can read off the rank of $M$ from the number of rows in $D$, while the nullity is the number of zero columns in $\Sigma$.

More heuristically, this is saying we can break any transformation into three parts. First, $V^*$ picks out an orthonormal basis of “canonical input vectors”. Then $\Sigma$ handles the actual transformation, scaling the components in these directions, or killing them off entirely (for the zero columns). Finally, $U$ takes us out of the orthonormal basis of “canonical output vectors”. It tells us that if we’re allowed to pick the input and output bases separately, we kill off one subspace (the kernel) and can diagonalize the action on the remaining subspace.

The SVD also comes in handy for solving systems of linear equations. Let’s say we have a system written down as the matrix equation

$\displaystyle Mx+b=0$

where $M$ is the matrix of the system. If $b\in B$ is the zero vector we have a homogeneous system, and otherwise we have an inhomogeneous system. So let’s use the singular value decomposition for $M$:

$\displaystyle U\Sigma V^*x+b=0$

and then we find

$\displaystyle\Sigma V^*x=-U^*b$

So we can check ourselves by calculating $-U^*b$. If this extends into the zero rows of $\Sigma$ there’s no possible way to satisfy the equation. That is, we can quickly see if the system is unsolvable. On the other hand, if $-U^*b$ lies completely within the nonzero rows of $\Sigma$, it’s straightforward to solve this equation. We first write down the new transformation

$\displaystyle\Sigma^+=\begin{pmatrix}D^{-1}&0\\{0}&0\end{pmatrix}$

where it’s not quite apparent from this block form, but we’ve also taken a transpose. That is, there are as many columns in $\Sigma^+$ as there are rows in $\Sigma$, and vice versa. The upshot is that $\Sigma^+\Sigma:A\rightarrow A$ is a transformation which kills off the same kernel as $\Sigma$ does, but is otherwise the identity. Thus we can proceed

$\displaystyle\Sigma^+\Sigma V^*x=-\Sigma^+U^*b$

This $\Sigma^+$ “undoes” the scaling from $\Sigma$. We can also replace the lower rows of $-\Sigma^+U^*b$ with variables, since applying $\Sigma$ will kill them off anyway. Finally, we find

$\displaystyle x=-V\Sigma^+U^*b$

and, actually, a whole family of solutions for the variables we could introduce in the previous step. But this will at least give one solution, and then all the others differ from this one by a vector in $\mathrm{Ker}(M)$, as usual.

August 18, 2009

## The Singular Value Decomposition

Now the real and complex spectral theorems give nice decompositions of self-adjoint and normal transformations, respectively. Each one is of a similar form

\displaystyle\begin{aligned}S&=O\Lambda O^*\\H&=U\Lambda U^*\end{aligned}

where $O$ is orthogonal, $U$ is unitary, and $\Lambda$ (in either case) is diagonal. What we want is a similar decomposition for any transformation. And, in fact, we’ll get one that even works for transformations between different inner prouct spaces.

So let’s say we’ve got a transformation $M:A\rightarrow B$ (we’re going to want to save $U$ and $V$ to denote transformations). We also have its adjoint $M^*:B\rightarrow A$. Then $M^*M:A\rightarrow A$ is positive-semidefinite (and thus self-adjoint and normal), and so the spectral theorem applies. There must be a unitary transformation $V:A\rightarrow A$ (orthogonal, if we’re working with real vector spaces) so that

$\displaystyle V^*M^*MV=\begin{pmatrix}D&0\\{0}&0\end{pmatrix}$

where $D$ is a diagonal matrix with strictly positive entries.

That is, we can break $A$ up as the direct sum $A=A_1\oplus A_2$. The diagonal transformation $D:A_1\rightarrow A_1$ is positive-definite, while the restriction of $V^*M^*MV$ to $A_2$ is the zero transformation. We will restrict $V$ to each of these subspaces, giving $V_1:A_1\rightarrow A$ and $V_2:A_2\rightarrow A$, along with their adjoints $V_1^*:A\rightarrow A_1$ and $V_2^*:A\rightarrow A_2$. Then we can write

$\displaystyle\begin{pmatrix}V_1^*\\V_2^*\end{pmatrix}M^*M\begin{pmatrix}V_1&V_2\end{pmatrix}=\begin{pmatrix}V_1^*M^*MV_1&V_1^*M^*MV_2\\V_2^*M^*MV_1&V_2^*M^*MV_1\end{pmatrix}=\begin{pmatrix}D&0\\{0}&0\end{pmatrix}$

From this we conclude both that $V_1^*M^*MV_1=D$ and that $MV_2=0$. We define $U_1=MV_1D^{-\frac{1}{2}}:A_1\rightarrow B$, where we get the last matrix by just taking the inverse of the square root of each of the diagonal entries of $D$ (this is part of why diagonal transformations are so nice to work with). Then we can calculate

\displaystyle\begin{aligned}U_1D^\frac{1}{2}V_1^*&=MV_1D^{-\frac{1}{2}}D^\frac{1}{2}V_1^*\\&=MV_1V_1^*\\&=MV_1V_1^*+MV_2V_2^*\\&=M\left(V_1V_1^*+V_2V_2^*\right)\\&=MVV^*=M\end{aligned}

This is good, but we don’t yet have unitary matrices in our decomposition. We do know that $V_1^*V_1=1_{A_1}$, and we can check that

\displaystyle\begin{aligned}U_1^*U_1&=\left(MV_1D^{-\frac{1}{2}}\right)^*MV_1D^{-\frac{1}{2}}\\&=D^{-\frac{1}{2}}V_1^*M^*MV_1D^{-\frac{1}{2}}\\&=D^{-\frac{1}{2}}DD^{-\frac{1}{2}}=1_{A_1}\end{aligned}

Now we know that we can use $V_2:A_2\rightarrow A$ to “fill out” $V_1$ to give the unitary transformation $V$. That is, $V_1^*V_1=1_{A_1}$ (as we just noted), $V_2^*V_2=1_{A_2}$ (similarly), $V_1^*V_2$ and $V_2^*V_1$ are both the appropriate zero transformation, and $V_1V_1^*+V_2V_2^*=1_A$. Notice that these are exactly stating that the adjoints $V_1^*$ and $V_2^*$ are the projection operators corresponding to the inclusions $V_1$ and $V_2$ in a direct sum representation of $A$ as $A_1\oplus A_2$. It’s clear from general principles that there must be some projections, but it’s the unitarity of $V$ that makes the projections be exactly the adjoints of the inclusions.

What we need to do now is to similarly fill out $U_1$ by supplying a corresponding $U_2:B_2\rightarrow B$ that will similarly “fill out” a unitary transformation $U$. But we know that we can do this! Pick an orthonormal basis of $A_1$ and hit it with $U_1$ to get a bunch of orthonormal vectors in $B$ (orthonormal because $U_1^*U_1=1_{A_1}$. Then fill these out to an orthonormal basis of all of $B$. Just set $B_2$ to be the span of all the new basis vectors, which is the orthogonal complement of the image of $U_1$, and let $U_2$ be the inclusion of $B_2$ into $B$. We can then combine to get a unitary transformation

$\displaystyle U=\begin{pmatrix}U_1&U_2\end{pmatrix}$

Finally, we define

$\displaystyle\Sigma=\begin{pmatrix}D^\frac{1}{2}&0\\{0}&0\end{pmatrix}$

where there are as many zero rows in $\Sigma$ as we needed to add to fill out the basis of $B$ (the dimension of $B_2$). I say that $U\Sigma V^*$ is our desired decomposition. Indeed, we can calculate

\displaystyle\begin{aligned}U\Sigma V^*&=\begin{pmatrix}U_1&U_2\end{pmatrix}\begin{pmatrix}D^\frac{1}{2}&0\\{0}&0\end{pmatrix}\begin{pmatrix}V_1^*\\V_2^*\end{pmatrix}\\&=\begin{pmatrix}U_1D^\frac{1}{2}&0\end{pmatrix}\begin{pmatrix}V_1^*\\V_2^*\end{pmatrix}\\&=U_1D^\frac{1}{2}V_1^*=M\end{aligned}

where $U$ and $V$ are unitary on $B$ and $A$, respectively, and $\Sigma$ is a “diagonal” transformation (not strictly speaking in the case where $A$ and $B$ have different dimensions).

August 17, 2009 Posted by | Algebra, Linear Algebra | 3 Comments

## The Real Spectral Theorem

Let’s take the last couple lemmas we’ve proven and throw them together to prove the real analogue of the complex spectral theorem. We start with a self-adjoint transformation $S:V\rightarrow V$ on a finite-dimensional real inner-product space $V$.

First off, since $S$ is self-adjoint, we know that it has an eigenvector $e_1$, which we can pick to have unit length (how?). The subspace $\mathbb{R}e_1$ is then invariant under the action of $S$. But then the orthogonal complement $V_1=\left(\mathbb{R}e_1\right)^\perp$ is also invariant under $S$. So we can restrict it to a transformation $S_1:V_1\rightarrow V_1$.

It’s not too hard to see that $S_1$ is also self-adjoint, and so it must have an eigenvector $e_2$, which will also be an eigenvector of $S$. And we’ll get an orthogonal complement $V_2$, and so on. Since every step we take reduces the dimension of the vector space we’re looking at by one, we must eventually bottom out. At that point, we have an orthonormal basis of eigenvectors for our original space $V$. Each eigenvector was picked to have unit length, and each one is in the orthogonal complement of those that came before, so they’re all orthogonal to each other.

Just like in the complex case, if we have a basis and a matrix already floating around for $S$, we can use this new basis to perform a change of basis, which will be orthogonal (not unitary in this case). That is, we can write the matrix of any self-adjoint transformation $S$ as $O\Lambda O^{-1}$, where $O$ is an orthogonal matrix and $\Lambda$ is diagonal. Alternately, since $O^{-1}=O^*$, we can think of this as $O\Lambda O^*$, in case we’re considering our transformation as representing a bilinear form (which self-adjoint transformations often are).

What if we’ve got this sort of representation? A transformation with a matrix of the form $O\Lambda O^*$ must be self-adjoint. Indeed, we can take its adjoint to find

$\displaystyle\left(O\Lambda O^*\right)^*=\left(O^*\right)^*\Lambda^*O^*=O\Lambda^*O^*$

but since $\Lambda$ is diagonal, it’s automatically symmetric, and thus represents a self-adjoint transformation. Thus if a real transformation has an orthonormal basis of eigenvectors, it must be self-adjoint.

Notice that this is a somewhat simpler characterization than in the complex case. This hinges on the fact that for real transformations taking the adjoint corresponds to simple matrix transposition, and every diagonal matrix is automatically symmetric. For complex transformations, taking the adjoint corresponds to conjugate transposition, and not all diagonal matrices are Hermitian. That’s why we had to expand to the broader class of normal transformations.

August 14, 2009 Posted by | Algebra, Linear Algebra | 10 Comments

## Every Self-Adjoint Transformation has an Eigenvector

Okay, this tells us nothing in the complex case, but for real transformations we have no reason to assume that a given transformation has any eigenvalues at all. But if our transformation is self-adjoint it must have one.

When we found this in the complex case we saw that the characteristic polynomial had to have a root, since $\mathbb{C}$ is algebraically closed. It’s the fact that $\mathbb{R}$ isn’t algebraically closed that causes our trouble. But since $\mathbb{R}$ sits inside $\mathbb{C}$ we can consider any real polynomial as a complex polynomial. That is, the characteristic polynomial of our transformation, considered as a complex polynomial (whose coefficients just happen to all be real) must have a complex root.

This really feels like a dirty trick, so let’s try to put it on a bit firmer ground. We’re looking at a transformation $S:V\rightarrow V$ on a vector space $V$ over $\mathbb{R}$. What we’re going to do is “complexify” our space, so that we can use some things that only work over the complex numbers. To do this, we’ll consider $\mathbb{C}$ itself as a two-dimensional vector space over $\mathbb{R}$ and form the tensor product $V^\mathbb{C}=V\otimes_\mathbb{R}\mathbb{C}$. The transformation $S$ immediately induces a transformation $S^\mathbb{C}:V^\mathbb{C}\rightarrow V^\mathbb{C}$ by defining $S^\mathbb{C}(v\otimes z)=S(v)\otimes z$. It’s a complex vector space, since given a complex constant $c\in\mathbb{C}$ we can define the scalar product of $v\otimes z$ by $c$ as $v\otimes(cz)$. Finally, $S^\mathbb{C}$ is complex-linear since it commutes with our complex scalar product.

What have we done? Maybe it’ll be clearer if we pick a basis $\left\{e_i\right\}_{i=1}^n$ for $V$. That is, any vector in $V$ is a linear combination of the $e_i$ in a unique way. Then every (real) vector in $V^\mathbb{C}$ is a unique linear combination of $e_i\otimes1$ and $e_i\otimes i$ (this latter $i$ is the complex number, not the index; try to keep them separate). But as complex vectors, we have $e_i\otimes i=i(e_i\otimes1)$, and so every vector is a unique complex linear combination of the $e_i\otimes1$. It’s like we’ve kept the same basis, but just decided to allow complex coefficients too.

And what about the matrix of $S^\mathbb{C}$ with respect to this (complex) basis of $e_i\otimes1$? Well it’s just the same as the old matrix of $S$ with respect to the $e_i$! Just write

$\displaystyle S^\mathbb{C}(e_i\otimes1)=S(e_i)\otimes1=(s_i^je_j)\otimes1=s_i^j(e_j\otimes1)$

Then if $S$ is self-adjoint its matrix will be symmetric, and so will the matrix of $S^\mathbb{C}$, which must then be self-adjoint as well. And we can calculate the characteristic polynomial of $S$ from its matrix, so the characteristic polynomial of $S^\mathbb{C}$ will be the same — except it will be a complex polynomial whose coefficients all just happen to be real.

Okay so back to the point. Since $S^\mathbb{C}$ is a transformation on a complex vector space it must have an eigenvalue $\lambda$ and a corresponding eigenvector $v$. And I say that since $S^\mathbb{C}$ is self-adjoint\$, the eigenvalue $\lambda$ must be real. Indeed, we can calculate

$\displaystyle\lambda\langle v,v\rangle=\langle v,\lambda v\rangle=\langle v,A(v)\rangle=\langle A(v),v\rangle=\langle\lambda v,v\rangle=\bar{\lambda}\langle v,v\rangle$

and thus $\lambda=\bar{\lambda}$, so $\lambda$ is real.

Therefore, we have found a real number $\lambda$ so that when we plug it into the characteristic polynomial of $S^\mathbb{C}$, we get zero. But then we also get zero when we plug it into the characteristic polynomial of $S$, and thus it’s also an eigenvalue of $S$.

And so, finally, every self-adjoint transformation on a real vector space has at least one eigenvector.

August 12, 2009 Posted by | Algebra, Linear Algebra | 1 Comment

## Invariant Subspaces of Self-Adjoint Transformations

Okay, today I want to nail down a lemma about the invariant subspaces (and, in particular, eigenspaces) of self-adjoint transformations. Specifically, the fact that the orthogonal complement of an invariant subspace is also invariant.

So let’s say we’ve got a subspace $W\subseteq V$ and its orthogonal complement $W^\perp$. We also have a self-adjoint transformation $S:V\rightarrow V$ so that $S(w)\in W$ for all $w\in W$. What we want to show is that for every $v\in W^\perp$, we also have $S(v)\in W^\perp$

Okay, so let’s try to calculate the inner product $\langle S(v),w\rangle$ for an arbitrary $w\in W$.

$\displaystyle\langle S(v),w\rangle=\langle v,S(w)\rangle=0$

since $S$ is self-adjoint, $S(w)$ is in $W$, and $v$ is in $W^\perp$. Then since this is zero no matter what $w\in W$ we pick, we see that $S(v)\in W^\perp$. Neat!

August 11, 2009 Posted by | Algebra, Linear Algebra | 1 Comment

## The Complex Spectral Theorem

We’re now ready to characterize those transformations on complex vector spaces which have a diagonal matrix with respect to some orthonormal basis. First of all, such a transformation must be normal. If we have a diagonal matrix we can find the matrix of the adjoint by taking its conjugate transpose, and this will again be diagonal. Since any two diagonal matrices commute, the transformation must commute with its adjoint, and is therefore normal.

On the other hand, let’s start with a normal transformation $N$ and see what happens as we try to diagonalize it. First, since we’re working over $\mathbb{C}$ here, we can pick an orthonormal basis that gives us an upper-triangular matrix and call the basis $\left\{e_i\right\}_{i=1}^n$. Now, I assert that this matrix already is diagonal when $N$ is normal.

Let’s write out the matrices for $N$

$\displaystyle\begin{pmatrix}a_{1,1}&\cdots&a_{1,n}\\&\ddots&\vdots\\{0}&&a_{n,n}\end{pmatrix}$

and $N^*$

$\displaystyle\begin{pmatrix}\overline{a_{1,1}}&&0\\\vdots&\ddots&\\\overline{a_{1,n}}&\cdots&\overline{a_{n,n}}\end{pmatrix}$

Now we can see that $N(e_1)=a_{1,1}e_1$, while $N^*(e_1)=\overline{a_{1,1}}e_1+\dots+\overline{a_{1,n}}e_n$. Since these bases are orthonormal, it’s easy to calculate the squared-lengths of these two:

\displaystyle\begin{aligned}\lVert N(e_1)\rVert^2&=\lvert a_{1,1}\rvert^2\\\lVert N^*(e_1)\rVert^2&=\lvert a_{1,1}\rvert^2+\dots+\lvert a_{1,n}\rvert^2\end{aligned}

But since $N$ is normal, these two must be the same. And so all the entries other than maybe $a_{1,1}$ in the first row of our matrix must be zero. We can then repeat this reasoning with the basis vector $e_2$, and reach a similar conclusion about the second row, and so on until we see that all the entries above the diagonal must be zero.

That is, not only is it necessary that a transformation be normal in order to diagonalize it, it’s also sufficient. Any normal transformation on a complex vector space has an orthonormal basis of eigenvectors.

Now if we have an arbitrary orthonormal basis — say $N$ is a transformation on $\mathbb{C}^n$ with the standard basis already floating around — we may want to work with the matrix of $N$ with respect to this basis. If this were our basis of eigenvectors, $N$ would have the diagonal matrix $\Lambda=\Bigl(\lambda_i\delta_{ij}\Bigr)$. But we may not be so lucky. Still, we can perform a change of basis using the basis of eigenvectors to fill in the columns of the change-of-basis matrix. And since we’re going from one orthonormal basis to another, this will be unitary!

Thus a normal transformation is not only equivalent to a diagonal transformation, it is unitarily equivalent. That is, the matrix of any normal transformation can be written as $U\Lambda U^{-1}$ for a diagonal matrix $\Lambda$ and a unitary matrix $U$. And any matrix which is unitarily equivalent to a diagonal matrix is normal. That is, if you take the subspace of diagonal matrices within the space of all matrices, then use the unitary group to act by conjugation on this subspace, the result is the subspace of all normal matrices, which represent normal transformations.

Often, you’ll see this written as $U\Lambda U^*$, which is really the same thing of course, but there’s an interesting semantic difference. Writing it using the inverse is a similarity, which is our notion of equivalence for transformations. So if we’re thinking of our matrix as acting on a vector space, this is the “right way” to think of the spectral theorem. On the other hand, using the conjugate transpose is a congruence, which is our notion of equivalence for bilinear forms. So if we’re thinking of our matrix as representing a bilinear form, this is the “right way” to think of the spectral theorem. But of course since we’re using unitary transformations here, it doesn’t matter! Unitary equivalence of endomorphisms and of bilinear forms is exactly the same thing.

August 10, 2009 Posted by | Algebra, Linear Algebra | 9 Comments

## Unitary and Orthogonal Matrices and Orthonormal Bases

I almost forgot to throw in this little observation about unitary and orthogonal matrices that will come in handy.

Let’s say we’ve got a unitary transformation $U$ and an orthonormal basis $\left\{e_i\right\}_{i=1}^n$. We can write down the matrix as before

$\displaystyle\begin{pmatrix}u_{1,1}&\cdots&u_{1,n}\\\vdots&\ddots&\vdots\\u_{n,1}&\cdots&u_{n,n}\end{pmatrix}$

Now, each column is a vector. In particular, it’s the result of transforming a basis vector $e_i$ by $U$.

$\displaystyle U(e_i)=u_{1,i}e_1+\dots+u_{n,i}e_n$

What do these vectors have to do with each other? Well, let’s take their inner products and find out.

$\displaystyle\langle U(e_i),U(e_j)\rangle=\langle e_i,e_j\rangle=\delta_{i,j}$

since $U$ preserves the inner product. That is the collection of columns of the matrix of $U$ form another orthonormal basis.

On the other hand, what if we have in mind some other orthonormal basis $\left\{f_j\right\}_{j=1}^n$. We can write each of these vectors out in terms of the original basis

$\displaystyle f_j=a_{1,j}e_1+\dots+a_{n,j}e_n$

and even get a change-of-basis transformation (like we did for general linear transformations) $A$ defined by

$\displaystyle A(e_j)=f_j=a_{1,j}e_1+\dots+a_{n,j}e_n$

so the $a_{i,j}$ are the matrix entries for $A$ with respect to the basis $\left\{e_i\right\}$. This transformation $A$ will then be unitary.

Indeed, take arbitrary vectors $v=v^ie_i$ and $w=w^je_j$. Their inner product is

$\displaystyle\langle v,w\rangle=\langle v^ie_i,w^je_j\rangle=\overline{v^i}w^j\langle e_i,e_j\rangle=\overline{v^i}w^j\delta_{i,j}$

On the other hand, after acting by $A$ we find

$\displaystyle\langle A(v),A(w)\rangle=\langle v^iA(e_i),w^jA(e_j)\rangle=\overline{v^i}w^j\langle f_i,f_j\rangle=\overline{v^i}w^j\delta_{i,j}$

since the basis $\left\{f_j\right\}$ is orthonormal as well.

To sum up: with respect to an orthonormal basis, the columns of a unitary matrix form another orthonormal basis. Conversely, writing any other orthonormal basis in terms of the original basis and using these coefficients as the columns of a matrix gives a unitary matrix. The same holds true for orthogonal matrices, with similar reasoning all the way through. And both of these are parallel to the situation for general linear transformations: the columns of an invertible matrix with respect to any basis form another basis, and conversely.

August 7, 2009 Posted by | Algebra, Linear Algebra | 3 Comments

## Eigenvalues and Eigenvectors of Normal Transformations

Let’s say we have a normal transformation $N$. It turns out we can say some interesting things about its eigenvalues and eigenvectors.

First off, it turns out that the eigenvalues of $N^*$ are exactly the complex conjugates of those of $N$ (the same, if we’re working over $\mathbb{R}$. Actually, this isn’t even special to normal operators. Indeed, if $T-\lambda I_V$ has a nontrivial kernel, then we can take the adjoint to find that $T^*-\bar{\lambda}I_V$ must have a nontrivial kernel as well. But if our transformation is normal, it turns out that not only do we have conjugate eigenvalues, they correspond to the same eigenvectors as well!

To see this, we do almost the same thing as before. But we get more than just a nontrivial kernel this time. Given an eigenvector $v$ we know that $\left(N-\lambda I_V\right)v=0$, and so it must have length zero. But if $N$ is normal then so is $N-\lambda I_V$:

\displaystyle\begin{aligned}\left(N-\lambda I_V\right)\left(N-\lambda I_V\right)^*&=\left(N-\lambda I_V\right)\left(N^*-\bar{\lambda}I_V\right)\\&=NN^*-\lambda I_VN^*-\bar{\lambda}NI_V+\lambda\bar{\lambda}I_VI_V\\&=N^*N-\lambda N^*I_V-\bar{\lambda}I_VN+\lambda\bar{\lambda}I_VI_V\\&=\left(N^*-\bar{\lambda}I_V\right)\left(N-\lambda I_V\right)\\&=\left(N-\lambda I_V\right)^*\left(N-\lambda I_V\right)\end{aligned}

and so acting by $\left(N-\lambda I_V\right)^*$ gives the same length as acting by $\left(N-\lambda I_V\right)$. That is:

$\displaystyle0=\lVert\left(N-\lambda I_V\right)v\rVert=\lVert\left(N-\lambda I_V\right)^*v\rVert=\lVert N^*v-\bar{\lambda}v\rVert$

thus by the definiteness of length, we know that $N^*v-\bar{\lambda}v$. That is, $v$ is also an eigenvector of $N^*$, with eigenvalue $\bar{\lambda}$.

Then as a corollary we can find that not only are the eigenvectors corresponding to distinct eigenvalues linearly independent, they are actually orthogonal! Indeed, if $v$ and $w$ are eigenvectors of $N$ with distinct eigenvalues $\lambda$ and $\mu$, respectively, then we find

\displaystyle\begin{aligned}(\lambda-\mu)\langle v,w\rangle&=\langle\bar{\lambda}v,w\rangle-\langle v,\mu w\rangle\\&=\langle N^*v,w\rangle-\langle v,Nw\rangle\\&=\langle v,Nw\rangle-\langle v,Nw\rangle=0\end{aligned}

Since $\lambda-\mu\neq0$ we must conclude that $\langle v,w\rangle=0$, and that the two eigenvectors are orthogonal.

August 6, 2009

## Normal Transformations

All the transformations in our analogy — self-adjoint and unitary (or orthogonal), and even anti-self-adjoint (antisymmetric and “skew-Hermitian”) transformations satisfying $T^*=-T$ — all satisfy one slightly subtle but very interesting property: they all commute with their adjoints. Self-adjoint and anti-self-adjoint transformations do because any transformation commutes with itself and also with its negative, since negation is just scalar multiplication. Orthogonal and unitary transformations do because every transformation commutes with its own inverse.

Now in general most pairs of transformations do not commute, so there’s no reason to expect this to happen commonly. Still, if we have a transformation $N$ so that $N^*N=NN^*$, we call it a “normal” transformation.

Let’s bang out an equivalent characterization of normal operators while we’re at it, so we can get an idea of what they look like geometrically. Take any vector $\lvert v\rangle$, hit it with $N$, and calculate its squared-length (I’m not specifying real or complex, since the notation is the same either way). We get

$\displaystyle\lVert\lvert N(v)\rangle\rVert^2=\langle N(v)\vert N(v)\rangle=\langle v\rvert N^*N\lvert v\rangle$

On the other hand, we could do the same thing but using $N^*$ instead of $N$.

$\displaystyle\lVert\lvert N^*(v)\rangle\rVert^2=\langle N^*(v)\vert N^*(v)\rangle=\langle v\rvert NN^*\lvert v\rangle$

But if $N$ is normal, then $N^*N$ and $NN^*$ are the same, and thus $\lVert\lvert N(v)\rangle\rVert^2=\lVert\lvert N^*(v)\rangle\rVert^2$ for all vectors $\lvert v\rangle$

Conversely, if $\lVert\lvert N(v)\rangle\rVert^2=\lVert\lvert N^*(v)\rangle\rVert^2$ for all vectors $\lvert v\rangle$, then we can use the polarization identities to conclude that $N^*N=NN^*$.

So normal transformations are exactly those that the length of a vector is the same whether we use the transformation or its adjoint. For self-adjoint and anti-self-adjoint transformations this is pretty obvious since they’re (almost) the same thing anyway. For orthogonal and unitary transformations, they don’t change the lengths of vectors at all, so this makes sense.

Just to be clear, though, there are matrices that are normal, but which aren’t any of the special kinds we’ve talked about so far. For example, the transformation represented by the matrix

$\displaystyle\begin{pmatrix}1&1&0\\{0}&1&1\\1&0&1\end{pmatrix}$

$\displaystyle\begin{pmatrix}1&0&1\\1&1&0\\{0}&1&1\end{pmatrix}$
$\displaystyle\begin{pmatrix}2&1&1\\1&2&1\\1&1&2\end{pmatrix}$