Geographically Weighted Regression

Traditional regression models assume that the relationships you are modeling are constant across space—a one-size-fits-all equation for your entire study area. In reality, processes like housing markets, disease spread, and environmental pollution are intensely local. Geographically Weighted Regression (GWR) is a spatial analysis technique that recognizes this by allowing relationships to vary from place to place. Instead of producing one "global" set of coefficients, GWR generates a unique local regression equation for every location in your dataset, providing a nuanced map of how predictors influence an outcome geographically. This makes it an indispensable tool for any analyst working with data tied to location, from urban planning to epidemiology.

The Limitation of Global Models and the GWR Solution

A standard ordinary least squares (OLS) regression generates a single, global model. For example, if you're modeling house prices, a global model might tell you that, on average, an additional bedroom adds $50,000 to a home's value across an entire city. Spatial heterogeneity—the concept that processes and relationships change over space—challenges this assumption. The value added by a bedroom might be much higher in a dense, desirable downtown neighborhood than in a distant suburb.

GWR addresses this by calibrating a separate regression model at each location (usually each data point). The key idea is that observations nearer to the target location have more influence on the local model's parameters than those farther away. This creates a moving window of analysis that sweeps across the map, calculating a bespoke set of coefficients—slopes and intercepts—for every point. The result is not a single equation, but a surface of equations that visually and quantitatively reveals where and how relationships change.

Kernel Functions and Bandwidth Selection

The "geographically weighted" part of GWR is controlled by a kernel function. This function determines how the influence of neighboring data points decays with distance from the regression point. A common approach is a Gaussian or bi-square kernel, where weights decrease smoothly as distance increases.

The most critical parameter in GWR is the bandwidth, which defines the spatial scale of the kernel. It answers the question: "How far should we look for neighbors to build each local model?" A bandwidth that is too small results in a very localized model, which may be overly sensitive to noise in the data. A bandwidth that is too large produces results very similar to a global OLS model, defeating the purpose of GWR.

Bandwidth selection is typically automated through criteria like the Akaike Information Criterion (AIC) or cross-validation. The goal is to find the bandwidth that minimizes prediction error or model complexity, ensuring the model captures genuine spatial variation without overfitting. For example, in a densely sampled urban dataset, an optimal bandwidth might be 500 meters, while for a state-level study of agricultural yields, it might be 50 kilometers.

Mapping and Interpreting Spatially Varying Coefficients

The primary output of a GWR analysis is a set of localized coefficient estimates. Each predictor variable you include will yield its own map of coefficients. Interpreting these maps is the core analytical task.

Instead of a single number stating "bedrooms coefficient = $50, 000, " yo ug e t ama pw h eree v ery p o in t ha s i t so w n v a l u e, s a y, r an g in g f ro m$ 20,000 to $120,000. You can map these values directly, using a color gradient to show where the relationship is strong and positive, weak, or even negative. This visualization instantly highlights hotspots of influence. For instance, a map of coefficients for "distance to highway" in a pollution study might show strong negative values (closer = more pollution) near industrial corridors, but weak or non-significant relationships in rural, upwind areas.

Statistical significance is also local in GWR. You can create companion maps showing where each local coefficient is statistically significant (e.g., p < 0.05). This prevents you from over-interpreting noisy coefficient estimates in regions with little data or weak relationships.

Comparing GWR with Global Regression and Mixed GWR

A vital step is to formally compare your GWR model with a global OLS model. A significantly lower AIC value for the GWR model suggests it provides a substantially better fit to the data by accounting for spatial heterogeneity. You can also compare the $R^{2}$ values; GWR often explains much more local variance.

However, not all relationships in a system may vary geographically. Mixed GWR (or Semi-parametric GWR) is an extension that allows you to specify some predictor variables as "global" (fixed across space) and others as "local" (varying across space). For example, in a real estate model, you might let the coefficients for "bedrooms" and "square footage" vary locally, but keep the coefficient for a national interest rate variable global. This mixed approach provides a more parsimonious and often more interpretable model when you have theoretical or empirical reasons to believe certain processes are not location-dependent.

Common Pitfalls

Ignoring Bandwidth Sensitivity: Treating the default bandwidth as absolute is a major mistake. Always test how sensitive your results are to different bandwidth selection methods (AIC vs. CV) and kernel types. The substantive story should be reasonably robust to minor changes in this parameter.
Misinterpreting Local Coefficients: A local coefficient map shows association, not necessarily causation, and the association is specific to that local model. Do not interpret a high local coefficient in isolation; always consider what other variables are in the model and the characteristics of that specific locality. Correlation between local coefficients themselves can also occur and needs careful examination.
Overlooking Computational and Statistical Issues: With large datasets, GWR can become computationally intensive. Furthermore, like any local regression, it can be susceptible to outliers and multicollinearity within local subsets of data. Always check for local collinearity and ensure your sample is sufficiently dense to support local estimation across the entire study area.

Summary

Geographically Weighted Regression (GWR) is a local modeling technique that produces a unique set of regression coefficients for each location, directly addressing the problem of spatial heterogeneity in relationships.
The model's behavior is governed by a kernel function and a crucial bandwidth parameter, which defines the spatial scale of influence and must be carefully selected using criteria like AIC.
The core output is maps of spatially varying coefficients, which allow for the visual and quantitative exploration of how predictor-outcome relationships change across geography.
GWR should be compared to a global OLS model to justify its use, and Mixed GWR can be employed when some relationships are appropriately modeled as global.
GWR has powerful applications in fields like real estate pricing (valuing amenities locationally), environmental science (modeling varying pollution drivers), and public health (identifying place-specific risk factors for disease).

Geographically Weighted Regression

Geographically Weighted Regression

The Limitation of Global Models and the GWR Solution

Kernel Functions and Bandwidth Selection

Mapping and Interpreting Spatially Varying Coefficients

Comparing GWR with Global Regression and Mixed GWR

Common Pitfalls

Summary

Write better notes with AI