Least Squares Curve Fitting

In engineering and science, you often collect data and need to find a mathematical model that best describes the underlying relationship. Least squares curve fitting is the foundational technique for this task, finding the model parameters that minimize the sum of the squared differences between the observed data and the model's predictions. This method provides an objective, optimal fit that is computationally tractable and forms the backbone of regression analysis, calibration, and predictive modeling across every engineering discipline, from signal processing to structural analysis.

The Foundational Idea: Minimizing Squared Residuals

At its core, least squares fitting is an optimization problem. You start with a set of $N$ data points $(x_{i}, y_{i})$ and a proposed model—a function $f (x, β)$ that predicts $y$ based on $x$ and a set of unknown parameters $β$ . The difference between the observed value $y_{i}$ and the predicted value $f (x_{i}, β)$ is called the residual, $r_{i} = y_{i} - f (x_{i}, β)$ .

The goal is to find the parameter values $β$ that make the model predictions as close as possible to the data. The least squares criterion achieves this by minimizing the sum of squared residuals (SSR):

$SSR (β) = i = 1 \sum N r_{i}^{2} = i = 1 \sum N [y_{i} - f (x_{i}, β)]^{2}$

Squaring the residuals has two key effects: it ensures all errors are positive (so a negative residual doesn't cancel a positive one), and it penalizes larger errors more severely. This yields a unique solution for many common model forms and has desirable statistical properties if the errors in the data are random and normally distributed.

Linear Least Squares and the Normal Equations

The most common application is linear least squares, where the model is a linear combination of the parameters. A quintessential example is fitting a straight line, $y = β_{0} + β_{1} x$ . Here, $f (x, β) = β_{0} + β_{1} x$ , and the parameters $β_{0}$ (intercept) and $β_{1}$ (slope) are unknown.

To find the optimal $β_{0}$ and $β_{1}$ , you set the partial derivatives of the SSR with respect to each parameter to zero. This process leads to a system of normal equations. For the line example, the normal equations are:

$N β_{0} + (\sum x_{i}) β_{1} (\sum x_{i}) β_{0} + (\sum x_{i}^{2}) β_{1} = \sum y_{i} = \sum x_{i} y_{i}$

Solving this 2x2 system gives you the familiar formulas for the slope and intercept. This framework extends elegantly to any model linear in its parameters, such as a polynomial. Polynomial regression fits a model like $y = β_{0} + β_{1} x + β_{2} x^{2} + ... + β_{m} x^{m}$ . Although the model is nonlinear in $x$ , it is linear in the parameters $β_{0}, β_{1}, ..., β_{m}$ , so the same least squares principle applies.

You can express any linear least squares problem in matrix form. Define the design matrix $X$ , where each row corresponds to a data point and each column to a basis function (e.g., for a quadratic, $1$ , $x$ , $x^{2}$ ). Let $y$ be the vector of observed $y$ -values. The parameter vector $β$ that minimizes $SSR = ∣∣ y - X β ∣ ∣^{2}$ is given by the solution to the matrix normal equation:

$(X^{T} X) β = X^{T} y$

Solving for $β$ gives $β = (X^{T} X)^{- 1} X^{T} y$ . This is a powerful result, as it allows you to fit complex linear models (including multivariable ones) using straightforward matrix algebra.

Analyzing Residuals for Model Adequacy

Finding the best-fit parameters is only half the job. You must then assess whether the model is adequate. Residual analysis is the primary tool for this. After fitting, calculate the residuals $r_{i} = y_{i} - \overset{y}{^}_{i}$ , where $\overset{y}{^}_{i}$ is the fitted value.

Plotting residuals versus the independent variable $x$ (or versus the fitted values $\overset{y}{^}$ ) is crucial. A good fit will show residuals randomly scattered around zero with constant variance (homoscedasticity). Patterns in this plot—like a curve, a funnel shape, or systematic streaks—indicate model inadequacy. A curved pattern suggests a higher-order term (e.g., a quadratic) is missing. A funnel shape indicates non-constant variance, violating a key least squares assumption.

You should also check a histogram or a normal probability plot of the residuals. The least squares method is most statistically efficient if the errors (and thus the residuals) are approximately normally distributed. Significant skew or outliers can suggest problems with the data or the model. Outliers, points with exceptionally large residuals, can disproportionately influence the fit; it's essential to investigate whether they are data errors or indicate a need for a different model.

Nonlinear Least Squares and the Gauss-Newton Method

Many engineering models are inherently nonlinear in their parameters. Examples include exponential decay ( $y = β_{1} e^{β_{2} x}$ ) or a sinusoidal model ( $y = β_{1} sin (β_{2} x + β_{3})$ ). Here, the sum of squared residuals landscape is not a simple parabola, and no closed-form solution like the normal equations exists.

Nonlinear least squares relies on iterative, numerical methods, with the Gauss-Newton method being a standard approach. The strategy is to approximate the nonlinear model with a linear one at each iteration. You start with an initial guess for the parameters $β^{(0)}$ . The nonlinear model $f (x_{i}, β)$ is then linearized using a first-order Taylor series expansion around the current guess.

This linearization creates a design matrix $J$ , called the Jacobian, which contains the partial derivatives of the model with respect to each parameter, evaluated at the current guess and each data point. This turns the problem into a linear least squares problem for a small parameter update $δ$ :

$J δ \approx y - f (β^{(k)})$

You solve this linear system for $δ$ using methods for linear least squares, then update the parameters: $β^{(k + 1)} = β^{(k)} + δ$ . This process repeats until the parameter changes become negligibly small, indicating convergence to a local minimum of the SSR. The success of Gauss-Newton depends heavily on a good initial guess; poor guesses can lead to convergence to a suboptimal local minimum or failure to converge at all.

Common Pitfalls

Overfitting with High-Order Polynomials: It's tempting to keep adding polynomial terms to make the curve pass through more points. A model like a 10th-order polynomial can fit a 12-point dataset nearly perfectly (SSR near zero), but it will oscillate wildly between points and fail to predict new data. This is overfitting. Always prefer the simplest model that adequately describes the trend. Use residual analysis and consider the physical plausibility of the model.

Ignoring Residual Patterns: Declaring the job done after obtaining the parameter values is a major error. A low SSR does not automatically mean a good model. You must examine residual plots. Failing to detect a curved pattern means you might miss a systematic bias in your predictions, rendering your model unreliable for its intended use.

Misapplying Linear Least Squares to Nonlinear Problems: You cannot directly use the linear normal equations $(X^{T} X) β = X^{T} y$ for models nonlinear in their parameters (e.g., $y = β_{1} / (x + β_{2})$ ). Attempting to do so will yield incorrect, meaningless parameters. Recognize the structure of your model and apply nonlinear least squares methods like Gauss-Newton when required.

Forgetting to Assess the Conditioning of $X^{T} X$ : In linear least squares, solving the normal equations requires inverting $X^{T} X$ . If your basis functions (e.g., $1, x, x^{2}, ...$ ) are highly correlated for your data, this matrix can be ill-conditioned (nearly singular). This makes the solution for $β$ extremely sensitive to tiny errors or noise in the data, leading to numerically unstable and unreliable parameters. Using polynomial bases with centered and scaled $x$ -values can help mitigate this.

Summary

Least squares fitting is the process of optimizing a model's parameters by minimizing the sum of squared residuals between the model predictions and the observed data.
For models linear in their parameters (like straight lines or polynomials), the optimal parameters are found analytically by forming and solving the normal equations, often expressed in the compact matrix form $β = (X^{T} X)^{- 1} X^{T} y$ .
Residual analysis—plotting and examining the differences between data and the fit—is non-negotiable for diagnosing model adequacy, checking assumptions, and detecting outliers or missing model terms.
Models that are nonlinear in their parameters require iterative numerical solutions, such as the Gauss-Newton method, which repeatedly linearizes the problem to converge on an optimal parameter set.
Key practical dangers include overfitting complex models to limited data, ignoring diagnostic residual patterns, and failing to account for numerical ill-conditioning in the solution process.

Least Squares Curve Fitting

Least Squares Curve Fitting

The Foundational Idea: Minimizing Squared Residuals

Linear Least Squares and the Normal Equations

Analyzing Residuals for Model Adequacy

Nonlinear Least Squares and the Gauss-Newton Method

Common Pitfalls

Summary

Write better notes with AI