The Unapologetic Mathematician

The Meaning of the SVD

We spent a lot of time yesterday working out how to write down the singular value decomposition of a transformation $M:A\rightarrow B$, writing

$\displaystyle M=U\Sigma V^*$

where $U$ and $V$ are unitary transformations on $B$ and $A$, respectively, and $\Sigma:A\rightarrow B$ is a “diagonal” transformation, in the sense that its matrix looks like

$\displaystyle\Sigma=\begin{pmatrix}D&0\\{0}&0\end{pmatrix}$

where $D$ really is a nonsingular diagonal matrix.

So what’s it good for?

Well, it’s a very concrete representation of the first isomorphism theorem. Every transformation is decomposed into a projection, an isomorphism, and an inclusion. But here the projection and the inclusion are built up into unitary transformations (as we verified is always possible), and the isomorphism is the $D$ part of $\Sigma$.

Incidentally, this means that we can read off the rank of $M$ from the number of rows in $D$, while the nullity is the number of zero columns in $\Sigma$.

More heuristically, this is saying we can break any transformation into three parts. First, $V^*$ picks out an orthonormal basis of “canonical input vectors”. Then $\Sigma$ handles the actual transformation, scaling the components in these directions, or killing them off entirely (for the zero columns). Finally, $U$ takes us out of the orthonormal basis of “canonical output vectors”. It tells us that if we’re allowed to pick the input and output bases separately, we kill off one subspace (the kernel) and can diagonalize the action on the remaining subspace.

The SVD also comes in handy for solving systems of linear equations. Let’s say we have a system written down as the matrix equation

$\displaystyle Mx+b=0$

where $M$ is the matrix of the system. If $b\in B$ is the zero vector we have a homogeneous system, and otherwise we have an inhomogeneous system. So let’s use the singular value decomposition for $M$:

$\displaystyle U\Sigma V^*x+b=0$

and then we find

$\displaystyle\Sigma V^*x=-U^*b$

So we can check ourselves by calculating $-U^*b$. If this extends into the zero rows of $\Sigma$ there’s no possible way to satisfy the equation. That is, we can quickly see if the system is unsolvable. On the other hand, if $-U^*b$ lies completely within the nonzero rows of $\Sigma$, it’s straightforward to solve this equation. We first write down the new transformation

$\displaystyle\Sigma^+=\begin{pmatrix}D^{-1}&0\\{0}&0\end{pmatrix}$

where it’s not quite apparent from this block form, but we’ve also taken a transpose. That is, there are as many columns in $\Sigma^+$ as there are rows in $\Sigma$, and vice versa. The upshot is that $\Sigma^+\Sigma:A\rightarrow A$ is a transformation which kills off the same kernel as $\Sigma$ does, but is otherwise the identity. Thus we can proceed

$\displaystyle\Sigma^+\Sigma V^*x=-\Sigma^+U^*b$

This $\Sigma^+$ “undoes” the scaling from $\Sigma$. We can also replace the lower rows of $-\Sigma^+U^*b$ with variables, since applying $\Sigma$ will kill them off anyway. Finally, we find

$\displaystyle x=-V\Sigma^+U^*b$

and, actually, a whole family of solutions for the variables we could introduce in the previous step. But this will at least give one solution, and then all the others differ from this one by a vector in $\mathrm{Ker}(M)$, as usual.