Linear Algebra: Orthogonal Projections

In engineering, you constantly face problems of approximation and noise reduction. Whether you're filtering a corrupted signal, compressing an image, or fitting a line to noisy sensor data, the mathematical machinery that enables you to find the best possible approximation is the orthogonal projection. This operation allows you to decompose a complex vector into a meaningful component within a subspace of interest and an irrelevant, orthogonal "error" component. Mastering projections transforms abstract vector spaces into a practical toolkit for simplifying and solving real-world problems.

The Geometry and Derivation of the Projection Formula

At its heart, an orthogonal projection maps a vector onto a subspace by dropping a perpendicular "shadow." The simplest case is projecting a vector $b$ onto a line spanned by another vector $a$ . The goal is to find the scalar multiple $c a$ that is closest to $b$ . "Closest" here is defined by minimizing the squared error $∣∣ b - c a ∣ ∣^{2}$ , which is equivalent to making the error vector $(b - c a)$ perpendicular to $a$ .

This orthogonality condition gives us the key equation: $a \cdot (b - c a) = 0.$ Solving for the scalar $c$ yields the projection formula: $c = \frac{a \cdot b}{a \cdot a} .$ Therefore, the projection of $b$ onto the line of $a$ is: $proj_{a} (b) = (\frac{a \cdot b}{a \cdot a}) a .$ The denominator $a \cdot a$ is the squared length $∣∣ a ∣ ∣^{2}$ . The component of $b$ orthogonal to $a$ is simply $b - proj_{a} (b)$ . This geometric derivation—minimizing distance by enforcing orthogonality—is the foundational principle that generalizes to all subspaces.

Projection onto Higher-Dimensional Subspaces and Planes

The logic extends seamlessly from lines to planes or any subspace defined by a basis. Suppose you want to project a vector $b$ onto a plane (a 2D subspace) in $R^{3}$ spanned by two independent vectors $a_{1}$ and $a_{2}$ . The projection $\hat{b}$ will be a linear combination $x_{1} a_{1} + x_{2} a_{2}$ that lies in the plane. The error vector $e = b - \hat{b}$ must be perpendicular to the entire plane, meaning it is orthogonal to both $a_{1}$ and $a_{2}$ .

This gives two orthogonality conditions: $a_{1} \cdot (b - x_{1} a_{1} - x_{2} a_{2}) = 0,$ $a_{2} \cdot (b - x_{1} a_{1} - x_{2} a_{2}) = 0.$ These equations reorganize into a $2 \times 2$ system of normal equations: ${(a_{1} \cdot a_{1}) x_{1} + (a_{1} \cdot a_{2}) x_{2} = a_{1} \cdot b, (a_{2} \cdot a_{1}) x_{1} + (a_{2} \cdot a_{2}) x_{2} = a_{2} \cdot b .$ In matrix form, if $A = [a_{1} a_{2}]$ , this system is $A^{T} A x = A^{T} b$ . The solution vector $x$ contains the coefficients that produce the projection: $\hat{b} = A x = A (A^{T} A)^{- 1} A^{T} b$ . This framework works for projecting onto any column space of a matrix $A$ .

The Orthogonal Decomposition Theorem

The previous sections implicitly used a powerful, general result: the Orthogonal Decomposition Theorem. It states that for any vector $b$ in $R^{n}$ and any subspace $W$ , $b$ can be written uniquely as a sum: $b = \hat{b} + z,$ where $\hat{b}$ is in $W$ and $z$ is in $W^{⊥}$ (the orthogonal complement of $W$ ). The component $\hat{b}$ is precisely the orthogonal projection of $b$ onto $W$ , and it is the unique vector in $W$ closest to $b$ .

This decomposition is fundamental. The vector $\hat{b}$ represents the best approximation to $b$ from within the subspace $W$ . The orthogonal component $z$ represents the unavoidable error or residual of this approximation. In engineering terms, if $W$ is the space of valid signal models, $\hat{b}$ is the cleaned signal, and $z$ is the removed noise.

Projection Matrices and Their Key Properties

The formula $\hat{b} = A (A^{T} A)^{- 1} A^{T} b$ reveals a linear transformation: $b \to \hat{b}$ . The matrix that performs this transformation is the projection matrix $P$ : $P = A (A^{T} A)^{- 1} A^{T} .$ This matrix projects any vector onto the column space of $A$ .

Projection matrices have two distinctive algebraic properties that you should recognize instantly. First, they are idempotent, meaning $P^{2} = P$ . Projecting a vector that is already in the subspace does nothing. Second, they are symmetric, so $P^{T} = P$ . This symmetry is a direct consequence of the orthogonal projection and is evident in the formula.

A special case occurs when projecting onto a line in the direction of a unit vector $u$ . Here, $A$ becomes $u$ , $A^{T} A = 1$ , and the projection matrix simplifies to $P = u u^{T}$ , which is an outer product. Understanding these properties helps you verify calculations and reason about the behavior of iterative algorithms that use projections.

Engineering Applications: Signal Processing and Data Approximation

Orthogonal projections are not just abstract exercises; they are the workhorse of many engineering systems. In signal processing, a common task is to separate a signal from noise. You can model the space of "clean" signals as a subspace $W$ (e.g., signals of a certain frequency band). The received noisy signal $b$ is then projected onto $W$ to obtain the best estimate $\hat{b}$ of the original signal, effectively filtering out orthogonal noise components. This is the principle behind many filtering techniques, including those used in audio and image processing.

In data science and data approximation, linear regression is a direct application. Fitting a least-squares line $y = m x + c$ to a set of data points ${(x_{i}, y_{i})}$ is equivalent to projecting the data vector $y$ onto the subspace spanned by the vector of ones (for the constant term $c$ ) and the vector of $x$ -values (for the slope term $m$ ). The projection $\hat{y}$ gives the predicted values on the best-fit line, and the orthogonal error vector $e = y - \hat{y}$ contains the residuals, which are minimized in length.

Common Pitfalls

Misapplying the Formula for Non-Orthogonal Bases: The standard projection formula $A (A^{T} A)^{- 1} A^{T} b$ works for any basis $A$ of the subspace. A common mistake is trying to use the simple one-vector formula $(u \cdot b / u \cdot u) u$ for each basis vector separately and adding the results. This only yields the correct projection if the basis vectors are orthogonal to each other. If they are not, you must solve the normal equations or use the matrix formula.

Confusing the Projection with the Component Vector: The projection $\hat{b}$ is a vector in the original space (e.g., in $R^{3}$ ). The coefficient vector $x$ in $\hat{b} = A x$ is a different object—it lists the coordinates of the projection *with respect to the basis $A$ *. Don't report $x$ as the final answer when the problem asks for the projection vector itself.

Forgetting that $A^{T} A$ Must Be Invertible: The formula $P = A (A^{T} A)^{- 1} A^{T}$ requires that $A^{T} A$ be invertible. This condition holds if and only if the columns of $A$ are linearly independent. If you construct $A$ with dependent columns, you have not defined a proper basis for the subspace, and the formula fails. Always check that your spanning vectors are independent.

Assuming Projection Matrices are Invertible: A projection matrix is never invertible (unless it projects onto the whole space, which is just the identity matrix). Since it squashes any orthogonal component to zero, it has a nontrivial nullspace. Remember the idempotent property $P^{2} = P$ ; a true inverse cannot exist.

Summary

The orthogonal projection of a vector $b$ onto a subspace is the unique vector in that subspace closest to $b$ , found by enforcing that the error vector is perpendicular to the subspace.
The projection formula generalizes from a line $(\frac{a \cdot b}{a \cdot a} a)$ to a subspace via the matrix equation $\hat{b} = A (A^{T} A)^{- 1} A^{T} b$ , which solves the normal equations $A^{T} A x = A^{T} b$ .
The Orthogonal Decomposition Theorem guarantees any vector can be uniquely split into a projection (in the subspace) and an orthogonal error component, which is the foundation for best approximation.
Projection matrices $P$ are symmetric ( $P^{T} = P$ ) and idempotent ( $P^{2} = P$ ). They provide an algebraic tool for repeatedly applying projections.
Core engineering applications include least-squares data fitting (regression) and signal filtering, where projecting onto a "clean signal" subspace removes orthogonal noise.

Linear Algebra: Orthogonal Projections

Linear Algebra: Orthogonal Projections

The Geometry and Derivation of the Projection Formula

Projection onto Higher-Dimensional Subspaces and Planes

The Orthogonal Decomposition Theorem

Projection Matrices and Their Key Properties

Engineering Applications: Signal Processing and Data Approximation

Common Pitfalls

Summary

Write better notes with AI