Regression Discontinuity Design in Practice
AI-Generated Content
Regression Discontinuity Design in Practice
Regression Discontinuity Design (RDD) is a powerful quasi-experimental method that allows researchers to estimate causal effects when treatment is assigned based on whether a numeric running variable crosses a pre-determined cutoff. By comparing outcomes for units just above and just below this threshold, RDD mimics a randomized experiment at the cutoff, providing credible causal evidence in policy evaluation, economics, and education research. Mastering its implementation involves careful statistical choices and rigorous validity checks to transform a simple discontinuity into a defensible causal claim.
The Core Logic of RDD: Exploiting a Cutoff
The fundamental insight of RDD is that units just above and just below a treatment threshold are, in expectation, nearly identical in all observable and unobservable characteristics. The sole systematic difference is their treatment status. For example, students scoring 90/100 receive a scholarship (treatment), while those scoring 89 do not (control). If we observe a "jump" or discontinuity in outcomes (e.g., college graduation rates) precisely at this score threshold, that jump can be plausibly attributed to the treatment. The variable that determines assignment (like the test score) is called the running variable or forcing variable. The validity of the entire design rests on the assumption that individuals cannot precisely manipulate their score to fall on a specific side of the cutoff, an assumption you must test.
Sharp vs. Fuzzy RDD Designs
You will encounter two primary RDD types, distinguished by how strictly the treatment rule is applied. In a Sharp RDD, treatment assignment is a deterministic function of the running variable. All units on one side of the cutoff receive treatment, and all on the other side do not. The probability of treatment jumps from 0 to 1 at the threshold. This design is less common in practice because rules often have exceptions.
A Fuzzy RDD acknowledges imperfect compliance with the assignment rule. The probability of receiving treatment discontinuously increases at the cutoff but is not 0 or 1. For instance, some students scoring above the threshold may not take up the scholarship, while some below might receive it through other means. Here, you must use a two-stage approach analogous to Instrumental Variables (IV). The first stage uses the cutoff as an instrument to predict treatment receipt, and the second stage estimates the causal effect of the treatment on the outcome for "compliers"—those who adhere to the assignment rule.
Estimation: Local Linear Regression and Bandwidth Selection
You do not compare all treated and control units; you focus on a window or bandwidth around the cutoff. Using all data can introduce bias from the underlying relationship between the running variable and the outcome. The standard approach is to fit separate local linear regression models on either side of the cutoff. This method fits straight lines within the chosen bandwidth, minimizing bias compared to higher-order polynomials.
The choice of bandwidth is a critical bias-variance trade-off. A smaller bandwidth uses data closer to the cutoff, reducing bias from model misspecification but increasing variance (less precise estimates). A larger bandwidth improves precision but risks introducing bias. The recommended practice is to use data-driven cross-validation procedures, like leave-one-out or mean squared error (MSE) minimization, to select an optimal bandwidth. A common approach is to choose the bandwidth that minimizes the MSE of prediction for the observed data.
Visualization and Robustness Checks
Before any complex estimation, you must visualize the discontinuity. Create a binned scatterplot of the mean outcome against the running variable. Superimpose the local linear regression lines on either side of the cutoff. A clear visual jump at the cutoff is the first piece of evidence. Next, you must conduct formal manipulation testing of the running variable. Plot a histogram of the running variable. If individuals can manipulate their score, you might see heaping or an unusual density of observations just on the beneficial side of the cutoff. McCrary’s density test is a formal statistical test for this.
Placebo tests are essential for validating your design. You should check for discontinuities at "fake" thresholds where no treatment is assigned. If you find significant jumps at these placebo cutoffs, it suggests your observed discontinuity might be spurious. Furthermore, you should test for discontinuities in pre-determined covariates (e.g., age, prior income) at the true cutoff. In a valid RDD, these covariates should evolve smoothly; a jump would indicate systematic differences between groups, violating the design's assumption.
Practical Application in Policy and Education
Consider evaluating a merit-based college scholarship where funding is awarded to students with a GPA of 3.5 or higher. This is a classic setup for a Sharp RDD if the rule is strictly enforced. You would collect GPA (the running variable) and a later outcome like graduation (the outcome). Using local linear regression with an appropriate bandwidth around 3.5, you estimate the average causal effect of the scholarship on graduation rates for students at the margin.
In a Fuzzy RDD example, a school district might implement an intensive tutoring program for students scoring below a cutoff on a diagnostic test, but some parents might opt their children out. Here, the first-stage regression would show the jump in the probability of receiving tutoring at the cutoff. The second stage, using the cutoff as an instrument, yields the Local Average Treatment Effect (LATE) of tutoring on final exam scores for the subset of students whose participation was determined by the rule.
Common Pitfalls
- Ignoring Bandwidth Sensitivity: Reporting results from a single, arbitrarily chosen bandwidth is insufficient. Always present a sensitivity analysis, showing how your estimate changes across a range of plausible bandwidths. A robust finding should be stable across reasonable choices.
- Mis-specifying the Functional Form: Using a global high-order polynomial over the entire data range can lead to misleading estimates, as it can produce false discontinuities. Prefer local linear or quadratic regression with a data-driven bandwidth. Always check if your conclusions hold with different polynomial orders locally.
- Failing to Test for Manipulation: Overlooking a density test is a major oversight. If participants sorted themselves around the cutoff, the groups are no longer comparable, and your estimate is biased. This check is non-negotiable for credible RDD.
- Confusing Fuzzy RDD with a Simple Comparison: In a fuzzy design, you cannot simply compare outcomes above and below the cutoff. This mixes the effect of the treatment with the effect of the running variable itself. You must implement the two-stage IV estimator to recover the causal effect.
Summary
- RDD identifies causal effects by comparing units just above and just below a cutoff that determines treatment eligibility, leveraging the local randomness of assignment near the threshold.
- Sharp RDD assumes perfect compliance with the rule, while Fuzzy RDD uses the cutoff as an instrument to estimate the effect for compilers, requiring a two-stage estimation procedure.
- Local linear regression within an optimal bandwidth, chosen via cross-validation, is the standard for estimation, minimizing bias from the underlying trend.
- Validity hinges on rigorous checks: visualize the discontinuity, test for manipulation of the running variable (e.g., McCrary test), and run placebo tests at false cutoffs and on pre-treatment covariates.
- Always conduct bandwidth sensitivity analyses and be cautious of functional form assumptions to ensure your estimated discontinuity is robust and credible.