# The Unapologetic Mathematician

## Cramer’s Rule

We’re trying to invert a function $f:X\rightarrow\mathbb{R}^n$ which is continuously differentiable on some region $X\subseteq\mathbb{R}^n$. That is we know that if $a$ is a point where $J_f(a)\neq0$, then there is a ball $N$ around $a$ where $f$ is one-to-one onto some neighborhood $f(N)$ around $f(a)$. Then if $y$ is a point in $f(N)$, we’ve got a system of equations $\displaystyle f^j(x^1,\dots,x^n)=y^j$

that we want to solve for all the $x^i$.

We know how to handle this if $f$ is defined by a linear transformation, represented by a matrix $A=\left(a_i^j\right)$: \displaystyle\begin{aligned}f^j(x^1,\dots,x^n)=a_i^jx^i&=y^j\\Ax&=y\end{aligned}

In this case, the Jacobian transformation is just the function $f$ itself, and so the Jacobian determinant $\det\left(a_i^j\right)$ is nonzero if and only if the matrix $A$ is invertible. And so our solution depends on finding the inverse $A^{-1}$ and solving \displaystyle\begin{aligned}Ax&=y\\A^{-1}Ax&=A^{-1}y\\x&=A^{-1}y\end{aligned}

This is the approach we’d like to generalize. But to do so, we need a more specific method of finding the inverse.

This is where Cramer’s rule comes in, and it starts by analyzing the way we calculate the determinant of a matrix $A$. This formula $\displaystyle\sum\limits_{\pi\in S_n}\mathrm{sgn}(\pi)a_1^{\pi(1)}\dots a_n^{\pi(n)}$

involves a sum over all the permutations $\pi\in S_n$, and we want to consider the order in which we add up these terms. If we fix an index $i$, we can factor out each matrix entry in the $i$th column: $\displaystyle\sum\limits_{j=1}^na_i^j\sum\limits_{\substack{\pi\in S_n\\\pi(i)=j}}\mathrm{sgn}(\pi)a_1^{\pi(1)}\dots\widehat{a_i^j}\dots a_n^{\pi(n)}$

where the hat indicates that we omit the $i$th term in the product. For a given value of $j$, we can consider the restricted sum $\displaystyle A_j^i=\sum\limits_{\substack{\pi\in S_n\\\pi(i)=j}}\mathrm{sgn}(\pi)a_1^{\pi(1)}\dots\widehat{a_i^j}\dots a_n^{\pi(n)}$

which is $(-1)^{i+j}$ times the determinant of the $i$ $j$ “minor” of the matrix $A$. That is, if we strike out the row and column of $A$ which contain $a_i^j$ and take the determinant of the remaining $(n-1)\times(n-1)$ matrix, we multiply this by $(-1)^{i+j}$ to get $A_j^i$. These are the entries in the “adjugate” matrix $\mathrm{adj}(A)$.

What we’ve shown is that $\displaystyle A_j^ia_i^j=\det(A)$

(no summation on $i$). It’s not hard to show, however, that if we use a different row from the adjugate matrix we find $\displaystyle\sum\limits_{j=1}^nA_j^ka_i^j=\det(A)\delta_i^k$

That is, the adjugate times the original matrix is the determinant of $A$ times the identity matrix. And so if $\det(A)\neq0$ we find $\displaystyle A^{-1}=\frac{1}{\det(A)}\mathrm{adj}(A)$

So what does this mean for our system of equations? We can write \displaystyle\begin{aligned}x&=\frac{1}{\det(A)}\mathrm{adj}(A)y\\x^i&=\frac{1}{\det(A)}A_j^iy^j\end{aligned}

But how does this sum $A_j^iy^j$ differ from the one $A_j^ia_i^j$ we used before (without summing on $i$) to calculate the determinant of $A$? We’ve replaced the $i$th column of $A$ by the column vector $y$, and so this is just another determinant, taken after performing this replacement!

Here’s an example. Let’s say we’ve got a system written in matrix form $\displaystyle\begin{pmatrix}a&b\\c&d\end{pmatrix}\begin{pmatrix}x\\y\end{pmatrix}=\begin{pmatrix}u\\v\end{pmatrix}$

The entry in the $i$th row and $j$th column of the adjugate matrix is calculated by striking out the $i$th column and $j$th row of our original matrix, taking the determinant of the remaining matrix, and multiplying by $(-1)^{i+j}$. We get $\displaystyle\begin{pmatrix}d&-b\\-c&a\end{pmatrix}$

and thus we find $\displaystyle\begin{pmatrix}x\\y\end{pmatrix}=\frac{1}{ad-bc}\begin{pmatrix}d&-b\\-c&a\end{pmatrix}\begin{pmatrix}u\\v\end{pmatrix}=\frac{1}{ad-bc}\begin{pmatrix}ud-bv\\av-uc\end{pmatrix}$

where we note that \displaystyle\begin{aligned}ud-bv&=\det\begin{pmatrix}u&b\\v&d\end{pmatrix}\\av-uc&=\det\begin{pmatrix}a&u\\c&v\end{pmatrix}\end{aligned}

In other words, our solution is given by ratios of determinants: \displaystyle\begin{aligned}x&=\frac{\det\begin{pmatrix}u&b\\v&d\end{pmatrix}}{\det\begin{pmatrix}a&b\\c&d\end{pmatrix}}\\y&=\frac{\det\begin{pmatrix}a&u\\c&v\end{pmatrix}}{\det\begin{pmatrix}a&b\\c&d\end{pmatrix}}\end{aligned}

and similar formulae hold for larger systems of equations.

November 17, 2009 - Posted by | Algebra, Linear Algebra

1. From computational viewpoint, Cramer’s rule is nice for very small matrices, but in practice one should use Gaussian elimination or QR factorization to calculate the determinant and inverse of a matrix, or to solve a linear system. Comment by timur | November 17, 2009 | Reply

• Yes, but Gaussian elimination doesn’t tell you about the analytic properties of the inverse. Comment by Qiaochu Yuan | November 17, 2009 | Reply

• Going ogg on a tangent, whilst you’re right about computational requirements, Cramer’s rule is also worth knowing if you’re trying to simplify a linear system whose coefficients are themselves some functions of some other variables, since determinants have nice algebraic properties and interpretations as hyper-volumes. (This isn’t so much solving the linear system as simplifying it, say for large scale finite element models with structure. This is “algebraic form”, distinct from the “analytic properties” John Armstrong is using in his next post.)

It’s a shame that most mathematical degrees tend to briefly introduce Cramer’s rule almost as a historical artifact and then immediately introduce Gaussian elimination. Comment by davetweed | November 19, 2009 | Reply

2. Related to this wedge product thread, and this post, is the nice way that one can achieve the result of cramer’s rule directly without cofactors, adjoints, or even determinants. Suppose you have \begin{aligned}\begin{bmatrix}u \\ v\end{bmatrix}&=\begin{bmatrix}a & b \\ c & d\end{bmatrix}\begin{bmatrix}x \\ y\end{bmatrix} \\ &=\begin{bmatrix}a \\ c \end{bmatrix}x+\begin{bmatrix}b \\ d \end{bmatrix}y\end{aligned}

Now wedge both sides with one of the vectors, eliminating it. For example \begin{aligned}\begin{bmatrix}u \\ v\end{bmatrix}\wedge\begin{bmatrix}a \\ c\end{bmatrix}=y\begin{bmatrix}b \\ d \end{bmatrix}\wedge\begin{bmatrix}a \\ c\end{bmatrix}\end{aligned}

If the system is can be solved for $y$ the bivectors on the left and right hand sides differ only by a constant, and one can solve by division.

I first saw this nicely illustrated in this online book by John Browne here:

Since the wedge calculation can be reduced to a determinant, at its core this is no different than Cramer’s rule, but I think it provides a nice conceptual clarity. It is also no more efficient, and no less numerically unstable, so you probably really want SVD for computational work. Comment by peeterjoot | November 17, 2009 | Reply

3. Qiaochu is closest to the mark, as will become apparent later this week. I don’t just want the inverse, but I want to know something about the form of the inverse. Comment by John Armstrong | November 18, 2009 | Reply

4. Oh, and I actually wanted to introduce the adjugate matrix for a completely unrelated(?) reason I’ll be getting to later. It’s something I only know a handful of people have thought about, and none very explicitly. Comment by John Armstrong | November 18, 2009 | Reply

5. […] system of equations, which has a unique solution since the determinant of its matrix is . We use Cramer’s rule to solve it, and get an expression for our difference quotient as a quotient of two determinants. […]

Pingback by The Inverse Function Theorem « The Unapologetic Mathematician | November 18, 2009 | Reply

6. […] of a product matrix as a (quadratic) polynomial in the entries of and . As for inversion, Cramer’s rule expresses the entries of the inverse matrix as the quotient of a (degree ) polynomial in the […]

Pingback by General Linear Groups are Lie Groups « The Unapologetic Mathematician | June 9, 2011 | Reply