The Pseudo-Inverse and Minimum Norm Least Square Solution

September 9, 2020 4-minute read

Math • Econometrics • Numerical Linear Algebra

0. Introduction

In a previous post I presented a geometric derivation of ordinary least-squares by means of projections and orthogonal decomposition of the target space. There, the final part of the proof demonstrated that a least-squares solution is not necessarily unique. But how to choose which least-squares solution to go with? One approach leads to the pseudo-inverse via minimum norm considerations and is given below. Similar to the previous post, the material is taken from numerical algebra lectures notes of Prof. Wübbeling - a course I attended a couple years ago..

1. Finding the minimum norm solution

Define $x^+$ as minimum norm solution iff $x^+$ is least-squares solution to $Ax = b$ and $\Vert x^+\Vert \leq \Vert x\Vert$ for all $x \in \mathbb{R}^n$ that are least-squares solutions to $Ax = b$. The idea is to uniquely project $x$ onto $\ker(A)^{\perp} = \text{Im}(A^t)$.

Theorem. Let $A \in \mathbb{R}^{m \times n}$ and $b \in \mathbb{R}^m$. Then $x^+ \in \mathbb{R}^n$ is a minimum norm solution of $Ax = b$ iff

\begin{align} A^tAx^+ &= A^tb \\
x^+ &\in \text{Im}(A^t) \end{align}

The minimum norm solution is unique.

Proof. In the last article, I showed that $\mathbb{R}^m = \text{Im}(A) \oplus \ker(A^t)$ for $A: \mathbb{R}^n \to \mathbb{R}^m$. It analogously holds that $\mathbb{R}^n = \text{Im}(A^t) \oplus \ker(A)$ for $A^t: \mathbb{R}^m \to \mathbb{R}^n$. Hence, the least-squares solution $x$ can be decomposed into \begin{equation} x = \underbrace{u}_{\in\, \text{Im}(A^t)} \oplus \underbrace{v}_{\in\, \ker(A)} \end{equation}

Now, choose $x^+ := u$ which implies $A(x - x^+) = Av = 0$ because $v \in \ker(A)$. This in turn implies \begin{equation} A^tAx^+ = A^tA(x^+ + (x - x^+)) = A^tAx = A^tb \end{equation} As a result, $x^+$ is also a least-squares solution. Since any other least-squares solution $y$ is of the form $y = x^+ \oplus v$ where $v \in \ker(A)$ we calculate

\begin{equation} \Vert y\Vert^2_{2} = \Vert x^+ + v\Vert^2_{2} = \Vert x^+\Vert^2_{2} + \Vert v \Vert^2_{2} \geq \Vert x^+\Vert^2_{2} \end{equation}

the second condition on $x^+$ is also fulfilled. Furthermore, $y = x^+ + v = x^+ \iff v = 0$ and hence the uniqueness of $x^+.\square$

2. The pseudo-inverse and the minimum norm solution

Recall the original problem of determining a minimum norm solution $x^+$ for the linear equation $Ax = b$ with $A \in \mathbb{R}^{m \times n}$ and $b \in \mathbb{R}^m$. Building on the previous result, we then define the pseudo-inverse as the map $A^+$ that assigns to $b$ the minimum norm solution $x^+$, i.e. \begin{equation} A^+ : \mathbb{R}^m \to \mathbb{R}^n, b \mapsto x^+ \end{equation}

Theorem. Define $A^+$ as above. Then, it holds that

(1) $A^+ = A^{-1}$ for $n = m$ and $A$ invertible, (2) $A^+ = (A^tA)^{-1}A^t$ for $m > n$ and $\text{rk}(A) = n$, (3) $A^+ = A^t(AA^t)^{-1}$ for $m < n$ and $\text{rk}(A) = m$.

Proof. (1) is clear as then $x = x^+$ and $x = A^{-1}b = A^+b$.

With regard to (2), we observe that $\text{rk}(A) = n \iff \ker(A) = 0 \iff \ker(A^tA) = 0 \iff \text{rk}(A^tA) = n$. But $A^tA$ is a square matrix and, because also $\text{rk}(A^tA) = n$, therefore invertible. In total, we arrive at $A^tAx^+ = A^tb \implies x^+ = (A^tA)^{-1}A^tb$.

For (3), we proceed analogously to (2) but instead of $A$ being injective we now have that $A$ surjective which implies that $A^t$ is injective and therefore $(A^t)^tA^t = AA^t$ injective and invertible. Also, since $x^+ = A^tu$ for some $u \in \mathbb{R}^m$, we have \begin{equation} b = Ax^+ = AA^tu \implies x^+ = A^tu = A^t(AA^t)^{-1}b \qquad \square \end{equation}

3. Computing the pseudo-inverse practically

Although the previous theorem states analytically exact expressions for the pseudo-inverse in all relevant cases, these matrix computations are ill-suited to obtain the pseudo-inverse. The reason is that the algorithm to compute $A^tA$ accumulates errors and is not stable. Instead, we can derive an algorithm to infer the pseudo-inverse that draws on the QR-decomposition of a matrix.

So, let $A = QR$ with $Q \in \mathbb{R}^{m \times n}$ be an orthogonal matrix and $R \in \mathbb{R}^{n \times n}$ be a right-upper triangle matrix. Assume that $\text{rk}(A) = n$. This implies $\text{rk}(\,R) = n = \text{rk}(R^t)$. In sum, we arrive at

\begin{equation} A^tb = (QR)^tb = R^tQ^tb = A^tAx^+ = R^tQ^tQR = R^tRx^+ \implies R^tQ^tb = R^tRx^+ \implies \boxed{Rx^+ = Q^tb} \end{equation}

Now let $m \geq n$ and $\text{rk}{A} = m$. Then, we consider the QR-decomposition of $A^t = QR$ where $Q \in \mathbb{R}^{n\times m}$ with $Q^tQ = I$ and $R \in \mathbb{R}^{m \times m}$ right-upper triangle matrix. In this case, we obtain

\begin{equation} x^+ = A^t(AA^t)^{-1}b = QR(R^tQ^tR)^{-1}b = QRR^{-1}Q^{-1}(Q^t)^{-1}(R^t)^{-1}b = Q(R^t)^{-1}b = Qy \end{equation}

with $R^ty = b$. In sum, we have for this case that

\begin{align} x^+ &= Qy \\\ R^ty &= b \end{align}

4. Epilogue

A concept that is strongly related to the pseudo-inverse is Tikhonov-regularisation and singular value decomposition (SVD), both of which I treat elsewhere. In particular, it is possible to express the pseudo-inverse in terms of singular values.