Bayesian Structural Time Series

In an era driven by data, understanding the past and predicting the future of a time series—be it website traffic, product sales, or economic indicators—is a fundamental business and scientific task. Traditional models often require rigid assumptions, making them brittle in the face of real-world complexity. Bayesian Structural Time Series (BSTS) models offer a powerful, flexible alternative by decomposing a series into interpretable components like trend and seasonality, while rigorously quantifying uncertainty through Bayesian inference. This approach not only provides superior forecasts but also enables robust assessment of interventions, answering critical "what-if" questions.

The Structural Framework: Decomposing Time Series

At its heart, a BSTS model is a state-space model. This means it assumes your observed time series data is generated by a set of underlying, unobserved (or "latent") states that evolve over time according to simple rules. You don't directly see the trend or seasonality; you see their sum plus some noise. The BSTS framework builds your final model by adding together these latent components.

The most common components are:

Local Level: This is the simplest form of a trend—a mean value that can slowly drift over time. It's useful for series that don't exhibit strong growth or decline but still change.
Local Linear Trend: This component adds a slope to the level, allowing for trends that grow or shrink. Both the level and the slope are allowed to evolve, making this trend adaptive to changes in direction.
Seasonal Component: This models repeating patterns, such as weekly, monthly, or quarterly cycles. A BSTS model typically represents seasonality using a dummy variable formulation, which is more flexible than a simple fixed sinusoid and can handle changing seasonal patterns.
Regression Component: This is what sets BSTS apart for many business applications. You can incorporate external predictor variables (e.g., marketing spend, competitor prices, weather) into the model. The key is that BSTS uses a special technique to select which predictors are truly important.

The general model formulation combines these. If $y_{t}$ is your observed data at time $t$ , the model can be written as: $y_{t} = μ_{t} + τ_{t} + γ_{t} + β^{T} x_{t} + ϵ_{t}$ Here, $μ_{t}$ is the local level, $τ_{t}$ is the local trend, $γ_{t}$ is the seasonal effect, $x_{t}$ is a vector of regression covariates, and $ϵ_{t}$ is observation noise. Each of these components (except the regression coefficients $β$ ) has its own state equation that governs how it evolves from time $t - 1$ to $t$ .

Bayesian Inference and Uncertainty Quantification

Instead of providing single "best guess" estimates like classical models, BSTS treats all unknown quantities—the states, the trend volatility, the regression coefficients—as probability distributions. You start with a prior distribution that encapsulates your beliefs before seeing the data (e.g., "the seasonal effect is probably small"). After observing the data, you compute the posterior distribution using Bayes' theorem. This posterior is your complete summary of uncertainty about the model.

The primary computational tool for this is the Kalman filter, which sequentially updates the state estimates as new data arrives, and its companion, the Kalman smoother, which provides the best estimate of the state at each time point using the entire dataset. For BSTS, we use these tools within a Markov Chain Monte Carlo (MCMC) algorithm. MCMC draws thousands of samples from the complex posterior distribution. We then use these samples for all downstream tasks.

This leads to one of BSTS's greatest strengths: posterior predictive intervals. When making a forecast, we don't just get a single line. For each future time point, we generate thousands of possible outcomes based on the posterior samples, resulting in a full predictive distribution. This allows you to say, "There's a 90% chance that sales next quarter will be between $X an d$ Y," providing a honest and actionable picture of forecast uncertainty.

Spike-and-Slab Priors for Intelligent Variable Selection

Including many regression covariates risks overfitting—creating a model that memorizes noise in the historical data rather than capturing the true signal. BSTS elegantly solves this using spike-and-slab priors for the regression coefficients $β$ .

This prior is a mixture of two distributions:

A "spike" (e.g., a point mass at zero) representing the probability that a coefficient is exactly zero and the variable should be excluded.
A "slab" (e.g., a wide normal distribution) representing the distribution of the coefficient's value if the variable is included.

During the MCMC sampling, the model explores different combinations of variables. The final output tells you the posterior inclusion probability for each predictor—the percentage of samples in which its coefficient was non-zero. A variable with a 95% inclusion probability is almost certainly important, while one with a 10% probability is likely irrelevant. This automates robust variable selection within the modeling process itself.

Causal Impact Analysis for Intervention Assessment

A direct and powerful application of BSTS is causal impact analysis. Imagine your company launched a major marketing campaign. Did it actually increase sales? To answer this, you need to estimate what sales would have been in the absence of the campaign (the counterfactual).

The BSTS approach is to:

Build a model using data from the pre-intervention period, often incorporating covariates from other, unrelated time series (e.g., sales in a control market or broader industry indices) that are predictive of your target series.
Use this fitted model to forecast the counterfactual series for the post-intervention period, along with its prediction intervals.
Compare the actual observed data to this counterfactual forecast.

The difference between the actual and predicted series is the estimated causal impact. Because you have full posterior distributions, you can calculate the probability that the effect was positive and provide credible intervals for the cumulative lift in sales. This provides a statistically rigorous alternative to simpler pre-post comparisons.

Comparing BSTS Flexibility with ARIMA and Prophet

Understanding where BSTS fits among popular time series methods is crucial.

vs. ARIMA: ARIMA models are powerful for stationary series and are defined by their correlation structure (autoregressive and moving average terms). They are less interpretable, as you cannot directly extract a "trend" or "seasonality" component. BSTS is generally more flexible for handling complex, non-stationary trends and integrating external regressors in a principled way. ARIMA requires manual specification of orders (p,d,q), while BSTS's Bayesian framework automates much of the complexity through priors.
vs. Prophet: Prophet, developed by Facebook, is also an additive model with trend, seasonality, and holiday components. It is designed for robustness and ease of use, automatically handling missing data and outliers. BSTS is more statistically formal, providing full uncertainty quantification via Bayesian inference and sophisticated regression with spike-and-slab. Prophet is often faster and more "off-the-shelf," while BSTS offers greater modeling control and rigor for causal inference at the cost of more computational and statistical expertise.

Common Pitfalls

Ignoring MCMC Diagnostics: MCMC is a sampling algorithm, and it can fail. Failing to check diagnostics like trace plots for mixing or the Gelman-Rubin statistic can lead to basing conclusions on unreliable samples that don't represent the true posterior. Always verify convergence.
Overparameterizing the Local Trend: Using a local linear trend when a local level is sufficient adds unnecessary parameters and volatility, leading to overly wide and unhelpful prediction intervals. Start simple and add complexity only if the data supports it.
Misinterpreting Inclusion Probabilities: A high posterior inclusion probability for a covariate does not prove causation in an observational setting. It indicates statistical association within the model. For causal claims, the design (like in causal impact analysis) is paramount.
Using Inappropriate Comparison Series in Causal Impact: The power of causal impact hinges on having good predictor series. If your control series are also affected by the intervention or are unrelated to your target, the counterfactual forecast will be poor, invalidating the results.

Summary

BSTS models decompose a time series into intuitive latent components (level, trend, seasonality) and can incorporate external regressors, all within a probabilistic state-space framework.
It uses Bayesian inference and MCMC sampling to produce full posterior distributions, enabling honest posterior predictive intervals that quantify forecast uncertainty.
The spike-and-slab prior automatically performs robust variable selection, yielding posterior inclusion probabilities that measure each predictor's importance.
A primary application is causal impact analysis, which uses a BSTS model on pre-intervention data to forecast a counterfactual, providing a rigorous statistical assessment of an intervention's effect.
Compared to ARIMA, BSTS offers more interpretable component decomposition and integrated regression. Compared to Prophet, it offers fuller Bayesian uncertainty quantification and more formal mechanisms for variable selection and causal inference.

Bayesian Structural Time Series

Bayesian Structural Time Series

The Structural Framework: Decomposing Time Series

Bayesian Inference and Uncertainty Quantification

Spike-and-Slab Priors for Intelligent Variable Selection

Causal Impact Analysis for Intervention Assessment

Comparing BSTS Flexibility with ARIMA and Prophet

Common Pitfalls

Summary

Write better notes with AI