Repeated Measures ANOVA

When you track changes in the same group of people over time or under different experimental conditions, you need a statistical method that respects the non-independence of those measurements. Repeated measures ANOVA is the specialized tool for this job, allowing you to isolate true changes from background noise with greater sensitivity. It's a cornerstone for analyzing pre-post tests, longitudinal studies, and within-subjects experiments.

Understanding the Within-Subjects Design

A within-subjects design (or repeated measures design) is a research structure where the same participants are measured under all levels of an independent variable. For example, you might measure participants' reaction times to three different visual stimuli, or test their blood pressure before, during, and after a specific intervention. The core advantage of this design is its control for individual differences. Because each person serves as their own control, variability caused by inherent differences between people (like baseline metabolism, skill level, or personality) is removed from the error term.

This reduction in error variance directly translates to increased statistical power. In practical terms, you are more likely to detect a genuine effect of your experimental manipulation with a smaller sample size compared to an equivalent between-subjects design. However, this efficiency comes with a critical statistical responsibility: you must account for the correlation between measurements taken from the same individual. Repeated measures ANOVA correctly models this correlation, whereas a standard one-way ANOVA would violate the assumption of independent observations, leading to inflated Type I error rates.

From One-Way ANOVA to the Repeated Measures Model

To appreciate the repeated measures model, it helps to contrast it with its simpler cousin. A standard one-way ANOVA partitions total variability into two components: variability between groups and variability within groups (error). The formula for the total sum of squares is: $SST = SSB + SS W$ In this model, all variability not explained by the group factor is tossed into the single error term.

A one-factor repeated measures ANOVA makes a more nuanced partition. It recognizes that the total variability can be split into three parts: variability due to the within-subjects factor (e.g., time or condition), variability due to systematic differences between subjects, and the remaining within-subject error. The conceptual formula becomes: $SST = S S_{B e tw ee n S u bj ec t s} + S S_{Wi t hin S u bj ec t s}$ and further, $S S_{Wi t hin S u bj ec t s} = S S_{F a c t or} + S S_{E rror}$ By extracting and removing the variability attributable to stable differences between individuals ( $S S_{B e tw ee n S u bj ec t s}$ ), the error term ( $S S_{E rror}$ ) is purified, containing only the unsystematic fluctuation within each person across conditions. This is why the F-test in repeated measures ANOVA, calculated as $F = M S_{F a c t or} / M S_{E rror}$ , is often more powerful.

The Sphericity Assumption and Corrections

The most critical and often overlooked assumption in repeated measures ANOVA is sphericity (also known as circularity). Sphericity requires that the variances of the differences between all possible pairs of within-subject conditions are equal. In simpler terms, the correlation between your repeated measurements should be roughly similar across all condition pairings. Violating this assumption increases the risk of false positives (Type I errors).

You formally assess sphericity using Mauchly's test of sphericity. A significant p-value (typically p < .05) indicates a violation of the sphericity assumption. When sphericity is violated, you must adjust the degrees of freedom for your F-test to create a more conservative criterion for significance.

Two common corrections are applied:

Greenhouse-Geisser correction: Uses an epsilon ( $ϵ$ ) value to down-adjust the degrees of freedom. This is the most frequently used and generally recommended correction as it is quite conservative.
Huynh-Feldt correction: A less conservative adjustment. It is often used when the Greenhouse-Geisser epsilon is very low (e.g., below 0.75).

In practice, when reporting your results, you would state: "A repeated measures ANOVA showed a significant effect of time, $F (2, 30) = 9.85$ , p = .001. The sphericity assumption was violated (Mauchly's W = .65, p = .02), therefore Greenhouse-Geisser corrected values are reported ( $ϵ$ = .73, p = .003)."

Interpreting Results and Post-Hoc Tests

Assume you conducted a study on the effect of a memory training program. You tested 15 participants' recall scores at three time points: Baseline, Post-Training, and 3-Month Follow-Up. Your repeated measures ANOVA returns a significant main effect for Time.

The significant F-test tells you that at least one time point differs from the others, but not which ones. To pinpoint the specific differences, you must conduct post-hoc pairwise comparisons. Given the correlated data, you should use post-hoc tests designed for repeated measures, such as the Bonferroni adjustment or paired-samples t-tests with a corrected alpha level (e.g., Holm-Bonferroni).

When interpreting the output, you will examine the estimated marginal means for each condition. You might find that the mean recall score increased significantly from Baseline to Post-Training (p < .001) and remained significantly higher than Baseline at Follow-Up (p = .01), but that the drop from Post-Training to Follow-Up was not statistically significant (p = .15). This pattern tells a clear story about the intervention's immediate and somewhat sustained effect.

Common Pitfalls

Ignoring the sphericity assumption. Running a standard repeated measures ANOVA without checking Mauchly's test is a major error. Always inspect this test and apply the Greenhouse-Geisser or Huynh-Feldt correction if the assumption is violated. Reporting the uncorrected degrees of freedom in the face of a significant Mauchly's test misrepresents your analysis.

Using between-subjects post-hoc tests. Applying post-hoc tests like Tukey's HSD, which are designed for independent groups, to repeated measures data will give inaccurate results. You must use comparisons that account for the paired nature of the data, such as Bonferroni-adjusted paired t-tests.

Misinterpreting a significant main effect. A significant F-test for your within-subjects factor does not mean all conditions differ from each other. It only indicates that not all group means are equal. You must conduct and report follow-up paired comparisons to describe the specific pattern of results.

Overlooking interaction effects in factorial designs. When you have more than one within-subjects factor (e.g., Time and Drug Dose), the most interesting result is often the interaction. Failing to plot and probe a significant interaction can lead you to miss the core finding of your experiment, such as a treatment working only at later time points.

Summary

Repeated measures ANOVA is used for analyzing data where the same participants are measured under multiple conditions or time points, efficiently controlling for individual differences and increasing statistical power.
The method hinges on the sphericity assumption, which must be tested using Mauchly's test. If violated, corrections like Greenhouse-Geisser must be applied to avoid an increased risk of Type I error.
A significant main effect requires post-hoc pairwise comparisons that are appropriate for correlated data (e.g., Bonferroni-adjusted paired t-tests) to determine where the specific differences lie.
This design is powerful for tracking change over time and is more sensitive than between-subjects designs for detecting the effects of an intervention when participant variability is high.

Repeated Measures ANOVA

Repeated Measures ANOVA

Understanding the Within-Subjects Design

From One-Way ANOVA to the Repeated Measures Model

The Sphericity Assumption and Corrections

Interpreting Results and Post-Hoc Tests

Common Pitfalls

Summary

Write better notes with AI