Linear Algebra: Least Squares and Data Fitting

In engineering and science, you rarely have perfect data. Measurements are inherently noisy, and the relationships between variables are often complex. The method of least squares provides a powerful, systematic way to extract the best possible mathematical model from imperfect experimental data. This process, known as data fitting or regression, is the cornerstone of predictive analytics, control systems, signal processing, and virtually every field that relies on empirical observation. Mastering least squares is not just about solving equations; it's about turning uncertainty into actionable insight.

The Least Squares Principle

The core idea of least squares is both elegant and practical. When you have more data points (equations) than unknown parameters in your model, you have an overdetermined system of equations. Such a system typically has no exact solution. The least squares approach redefines "best solution" as the set of parameters that minimizes the sum of the squared differences between the observed data and the values predicted by your model.

These differences are called residuals. Formally, for a set of $m$ data points, if your model predicts a value $f (x_{i})$ and you observe $y_{i}$ , the residual for the $i$ -th point is $r_{i} = y_{i} - f (x_{i})$ . The least squares solution minimizes the sum of squared residuals: $S = \sum_{i = 1}^{m} r_{i}^{2} = \sum_{i = 1}^{m} (y_{i} - f (x_{i}))^{2}$ . Squaring the residuals ensures they are positive, penalizes larger errors more severely, and makes the resulting mathematics analytically tractable through calculus.

Linear Regression as Least Squares

The most common application is linear regression, where you fit a straight-line model $f (x) = β_{0} + β_{1} x$ to your data. Here, the parameters are the intercept $β_{0}$ and the slope $β_{1}$ . The objective is to find the values of $β_{0}$ and $β_{1}$ that minimize: $S (β_{0}, β_{1}) = i = 1 \sum m (y_{i} - (β_{0} + β_{1} x_{i}))^{2}$

By taking partial derivatives of $S$ with respect to $β_{0}$ and $β_{1}$ , setting them to zero, and solving the resulting system, you arrive at the classic formulas for the least-squares line. This process directly generalizes to more complex models through a unified matrix framework.

Polynomial Fitting by Least Squares

What if your data suggests a curved relationship, like the trajectory of a projectile or the stress-strain behavior of a material? Polynomial fitting extends linear regression by using a polynomial model of degree $n$ : $f (x) = β_{0} + β_{1} x + β_{2} x^{2} + ... + β_{n} x^{n}$ .

The parameters $β_{0}, β_{1}, ..., β_{n}$ are still linear with respect to the model, even though the model itself is nonlinear in $x$ . This is a crucial distinction: least squares applies to linear-in-the-parameters models. You set up a system where each data point gives an equation: $y_{i} = β_{0} + β_{1} x_{i} + β_{2} x_{i}^{2} + ... + β_{n} x_{i}^{n} + r_{i}$ . The goal remains to minimize the sum of squared $r_{i}$ values. The challenge now is solving for the $(n + 1)$ parameters efficiently and stably.

Multivariate Regression

Real engineering systems often depend on multiple factors. Multivariate regression allows you to fit a model that depends on several independent variables. For example, the efficiency of a cooling system might depend on fan speed ( $x_{1}$ ), coolant temperature ( $x_{2}$ ), and ambient pressure ( $x_{3}$ ). A linear multivariate model would be: $f (x_{1}, x_{2}, x_{3}) = β_{0} + β_{1} x_{1} + β_{2} x_{2} + β_{3} x_{3}$ .

This is where the matrix formulation becomes essential. You can write the entire set of equations for all $m$ data points as a single matrix equation: $y = X β + r$ .

$y$ is the $m \times 1$ vector of observed outputs.
$X$ is the $m \times (n + 1)$ design matrix. Its first column is often all ones (for the intercept $β_{0}$ ), and each subsequent column contains the data (or transformed data, like $x^{2}$ ) for one predictor variable.
$β$ is the $(n + 1) \times 1$ vector of unknown parameters.
$r$ is the $m \times 1$ vector of residuals.

The least squares objective is to minimize $∥ r ∥^{2} = ∥ y - X β ∥^{2}$ .

The Normal Equations

The standard analytical path to the least squares solution is through the normal equations. By applying calculus to the matrix objective function, you find that the minimizing vector $β$ must satisfy: $X^{T} X β = X^{T} y$ These are the normal equations. If the columns of $X$ are linearly independent (meaning your predictor variables aren't redundant), then $X^{T} X$ is invertible. The solution is then: $\hat{β} = (X^{T} X)^{- 1} X^{T} y$ Here, $\hat{β}$ denotes the estimated parameter vector. While mathematically correct, directly computing $(X^{T} X)^{- 1}$ can be numerically unstable for ill-conditioned problems (e.g., with highly correlated inputs or polynomials of high degree), as it squares the condition number of $X$ .

The QR Approach for Numerical Stability

For robust, production-grade computation, engineers use matrix factorizations. The QR approach is the gold standard. The idea is to factor the design matrix $X$ into the product of an orthogonal matrix $Q$ and an upper-triangular matrix $R$ : $X = QR$ .

Substituting this into the normal equations and using the property $Q^{T} Q = I$ simplifies them dramatically to: $R β = Q^{T} y$ Because $R$ is triangular, this system can be solved efficiently and accurately by back substitution. The QR decomposition avoids forming the numerically problematic $X^{T} X$ matrix entirely, leading to superior numerical stability. Most scientific computing libraries (like MATLAB, NumPy/SciPy) use a QR-based algorithm (often via Householder reflections) to solve least squares problems.

Residual Analysis and Model Validation

Finding the parameters $\hat{β}$ is only half the battle. You must validate your model. Residual analysis is the primary tool. After computing $\hat{β}$ , calculate the fitted values $\hat{y} = X \hat{β}$ and the residuals $r = y - \hat{y}$ .

You should then inspect these residuals. In a good fit, residuals should:

Have a mean near zero.
Exhibit no obvious pattern when plotted against the fitted values or predictor variables (random scatter).
Approximate a normal distribution (for making statistical inferences).

Systematic patterns in the residual plot indicate a poor model—perhaps you need a higher-order polynomial, a different model form, or have omitted a key variable. The root-mean-square error (RMSE), $∥ r ∥^{2} / (m - n - 1)$ , gives a single metric for the average prediction error.

Applications to Experimental Data Fitting in Engineering

The power of least squares is realized in concrete applications. Consider an engineer characterizing a new material. They perform a tensile test, collecting data points of stress ( $σ$ ) and strain ( $ϵ$ ). The relationship is often nonlinear. They might fit a quadratic model $σ = β_{0} + β_{1} ϵ + β_{2} ϵ^{2}$ to determine the modulus of elasticity (related to $β_{1}$ ) and understand the onset of plastic deformation.

In signal processing, you might fit a sinusoidal model to noisy sensor data to estimate an oscillation's amplitude, frequency, and phase. In control systems, system identification uses input-output data to fit a dynamic model (like an ARX model) for controller design. The process is always the same: choose a physically meaningful model structure, formulate the design matrix $X$ , and solve the least squares problem to find the parameters that best explain your observations.

Common Pitfalls

Overfitting: Using a polynomial degree that is too high ( $n$ too close to $m$ ) will make the model fit the noise in your training data perfectly, resulting in a complex curve that performs poorly on new data. Correction: Use validation techniques, prefer simpler models, and monitor the residual plot for randomness, not zero error.

Ignoring Residual Analysis: A low error metric doesn't guarantee a good model. A systematic pattern in the residuals means the model is missing a key feature of the data. Correction: Always plot and analyze residuals. They are your guide to model improvement.

Ignoring Numerical Stability: For problems with many variables or poorly scaled data, blindly using the normal equation formula $(X^{T} X)^{- 1} X^{T} y$ can lead to significant numerical error or complete failure. Correction: Use built-in library functions (e.g., np.linalg.lstsq) which employ stable algorithms like QR or SVD, or explicitly scale your data before fitting.

Misinterpreting Correlation as Causation: Least squares finds mathematical association, not cause-and-effect. A good fit between, say, coffee consumption and productivity does not prove one causes the other. Correction: Ground your model in domain knowledge and experimental design. The model is a tool for description and prediction within the context it was derived.

Summary

The least squares method finds the best-fitting model by minimizing the sum of the squared residuals—the differences between observed and predicted values.
Linear regression, polynomial fitting, and multivariate regression are all specific cases of this general framework, unified by the matrix equation $y = X β + r$ .
The solution can be derived analytically via the normal equations, $X^{T} X β = X^{T} y$ , but for numerical stability, the QR decomposition approach ( $X = QR$ ) is preferred in practice.
Residual analysis is non-negotiable for model validation; it checks for randomness and ensures the model adequately captures the data's structure.
This methodology is fundamental to experimental data fitting in engineering, from calibrating sensors and characterizing materials to identifying systems for control design.

Linear Algebra: Least Squares and Data Fitting

Linear Algebra: Least Squares and Data Fitting

The Least Squares Principle

Linear Regression as Least Squares

Polynomial Fitting by Least Squares

Multivariate Regression

The Normal Equations

The QR Approach for Numerical Stability

Residual Analysis and Model Validation

Applications to Experimental Data Fitting in Engineering

Common Pitfalls

Summary

Write better notes with AI