Numerical Linear Algebra

Numerical linear algebra is the engine behind virtually all large-scale scientific and engineering computing. Whether simulating fluid dynamics, training machine learning models, or optimizing complex systems, you are ultimately solving linear systems and eigenvalue problems. This field moves beyond theoretical existence proofs to deliver efficient, stable, and practical algorithms that can handle the immense matrices arising from modern discretizations of continuous phenomena.

Core Concept 1: Matrix Factorizations as Systematic Solvers

The first pillar of numerical linear algebra is replacing a difficult problem—solving $A x = b$ —with a sequence of easy ones. This is achieved through matrix factorizations, which decompose a matrix into a product of simpler matrices.

The LU factorization expresses a square matrix $A$ as the product of a lower triangular matrix $L$ and an upper triangular matrix $U$ , such that $A = LU$ . Solving $A x = b$ then becomes a two-step process: first solve $L y = b$ for $y$ (forward substitution), then solve $U x = y$ for $x$ (back substitution). The computational cost is roughly $\frac{2}{3} n^{3}$ operations for the factorization, which dominates the $O (n^{2})$ cost of the substitutions. This makes it highly efficient for solving multiple systems with the same matrix $A$ but different right-hand sides $b$ . Practical implementations use partial pivoting (row exchanges) to ensure numerical stability, resulting in $P A = LU$ , where $P$ is a permutation matrix.

When $A$ is rectangular or you require orthogonal transformations, the QR factorization is essential. It decomposes a matrix $A$ into the product $A = QR$ , where $Q$ is an orthogonal matrix ( $Q^{T} Q = I$ ) and $R$ is upper triangular. For solving linear least-squares problems $min ∥ A x - b ∥^{2}$ , the orthogonality of $Q$ is key. The problem simplifies to minimizing $∥ R x - Q^{T} b ∥^{2}$ , which is solved easily by back substitution because $R$ is triangular. The QR factorization is typically computed using Householder reflections or Givens rotations, which are numerically stable orthogonal transformations.

For the important case where $A$ is symmetric and positive definite (SPD), the Cholesky factorization provides a more efficient and stable alternative. It computes a unique lower triangular matrix $L$ with positive diagonal entries such that $A = L L^{T}$ . This factorization exploits the matrix's structure, using only about half the arithmetic operations and half the storage of LU. It is the method of choice for SPD systems arising in finite element analysis, covariance estimation, and many optimization problems.

Core Concept 2: Iterative Methods for Large, Sparse Systems

For massive systems where $n$ can be in the millions or billions, direct factorizations like LU or Cholesky become infeasible due to memory and time constraints. Furthermore, these methods typically destroy the sparsity (the abundance of zero entries) of the matrix. Iterative methods instead construct a sequence of progressively better approximations $x^{(k)}$ without explicitly modifying the matrix $A$ .

The Conjugate Gradient (CG) method is the premier iterative algorithm for symmetric positive definite systems. It is not merely an iterative solver but an optimal projection method. At each step, CG finds the best approximation within a Krylov subspace—the space spanned by ${b, A b, A^{2} b, ..., A^{k - 1} b}$ . It does so by generating search directions that are conjugate (or $A$ -orthogonal) with respect to each other. In exact arithmetic, CG converges to the exact solution in at most $n$ steps. However, its practical power lies in often achieving a sufficiently accurate solution in far fewer iterations, especially when paired with a good preconditioner. A classic application is solving the linear systems arising from the finite element discretization of elliptic PDEs like Poisson's equation.

For non-symmetric or indefinite systems, the Generalized Minimal Residual (GMRES) method is a workhorse. GMRES also builds solutions within a Krylov subspace but uses a different optimality criterion: at iteration $k$ , it finds the vector $x^{(k)}$ that minimizes the 2-norm of the residual $∥ b - A x^{(k)} ∥_{2}$ . This minimization is performed over the growing Krylov subspace, which requires storing an orthogonal basis for that subspace. Consequently, GMRES can become memory-intensive for many iterations, often necessitating restarts. Its strength is its broad applicability to any nonsingular, non-symmetric matrix.

Core Concept 3: Analyzing Numerical Stability and Sensitivity

An algorithm that produces the correct answer for a theoretical problem may fail miserably on a computer due to rounding errors. Therefore, analyzing the numerical stability of an algorithm and the inherent condition number of a problem is crucial.

The condition number quantifies the sensitivity of a problem's output to small changes in its input. For the linear system problem $A x = b$ , the condition number with respect to perturbations in $A$ is denoted $κ (A)$ . For a norm defined as $∥ \cdot ∥$ , it is given by $κ (A) = ∥ A ∥∥ A^{- 1} ∥$ . A large condition number (say, $> 1 0^{10}$ ) indicates an ill-conditioned problem, where tiny errors in $A$ or $b$ (from measurement or rounding) can cause enormous errors in the computed solution $x$ . The condition number is a property of the matrix itself, not the algorithm used to solve it.

Numerical stability assesses whether an algorithm introduces more sensitivity than the underlying problem warrants. A backward stable algorithm produces a computed solution $\tilde{x}$ that is the exact solution to a slightly perturbed problem $(A + δ A) \tilde{x} = b + δ b$ , where the perturbations $δ A$ and $δ b$ are tiny. The error in the solution can then be bounded by the condition number: $\frac{∥ x - x ~ ∥}{∥ x ∥} ≲ κ (A) ϵ$ , where $ϵ$ is the machine precision. QR factorization with Householder reflections is backward stable for solving least-squares problems. LU with partial pivoting is stable in practice, though not perfectly backward stable for all matrices. These stability guarantees are why we favor these algorithms over theoretically equivalent but unstable ones (like Gaussian elimination without pivoting).

Common Pitfalls

Ignoring Matrix Structure and Properties: Applying a general LU solver to a symmetric positive definite matrix wastes a factor of two in computation and storage and forfeits the superior stability of Cholesky. Similarly, using a method for dense matrices on a sparse one will consume catastrophic amounts of memory. Always identify key properties (SPD, banded, sparse) first.
Misinterpreting Iterative Method Convergence: Expecting Conjugate Gradient to work on a non-symmetric system will lead to failure. Furthermore, slow convergence of any iterative method is often a symptom of a high condition number. The remedy is not to run more iterations but to apply an effective preconditioner to reduce $κ (A)$ .
Overlooking Numerical Stability: Implementing the mathematically correct formula does not guarantee a numerically reliable algorithm. For example, explicitly computing $A^{- 1}$ to solve $A x = b$ is both inefficient and unstable, as it essentially squares the condition number. Stable algorithms like those using orthogonal transformations (QR) or pivoting (LU) are preferred.
Confusing Residual with Error: In iterative methods, you monitor the norm of the residual $r^{(k)} = b - A x^{(k)}$ . A small residual does not necessarily mean a small error $e^{(k)} = x - x^{(k)}$ , especially for ill-conditioned systems. The residual can be small while the error remains large—another manifestation of a large condition number.

Summary

Matrix factorizations (LU, QR, Cholesky) transform direct solving into efficient, stable triangular solves. Choose LU for general square systems, QR for least-squares or when stability is paramount, and Cholesky for symmetric positive definite systems.
Iterative methods (Conjugate Gradient, GMRES) are indispensable for large, sparse systems. They work by building optimal approximations within Krylov subspaces and require effective preconditioning to converge quickly.
The condition number $κ (A)$ measures the inherent sensitivity of a linear system to perturbations. An ill-conditioned problem ( $κ$ very large) will be difficult to solve accurately with any algorithm.
Numerical stability describes an algorithm's tendency not to amplify rounding errors beyond the level dictated by the condition number. Backward stability is the gold standard, ensuring small errors relative to the problem's sensitivity.
The path to an efficient and accurate solution requires matching the algorithm to the matrix's structure, understanding the convergence properties of iterative solvers, and respecting the principles of numerical analysis.

Numerical Linear Algebra

Numerical Linear Algebra

Core Concept 1: Matrix Factorizations as Systematic Solvers

Core Concept 2: Iterative Methods for Large, Sparse Systems

Core Concept 3: Analyzing Numerical Stability and Sensitivity

Common Pitfalls

Summary

Write better notes with AI