Statistics: Effect Sizes and Practical Significance

Statistical significance tells you whether an effect exists, but it says nothing about the size or importance of that effect. Effect sizes are the family of statistics that quantify the magnitude of a research finding, allowing you to move beyond the simplistic "yes/no" of a p-value to assess the real-world impact and practical significance of your results. Mastering effect sizes is essential for conducting meaningful research, performing accurate meta-analyses, and making informed decisions in fields like medicine, psychology, and public policy, where the size of an effect is often more critical than its mere existence.

The Limitation of p-Values and the Necessity of Effect Sizes

A p-value is the probability of obtaining results at least as extreme as the observed results, assuming the null hypothesis is true. While useful as a gatekeeper against random noise, it has severe limitations. A p-value is heavily influenced by sample size; a trivially small effect can be statistically significant with a massive sample, while a large, important effect may be non-significant with a small sample. This obsession with the "p < 0.05" threshold often leads to dichotomous thinking that obscures the actual data. Effect sizes solve this problem by providing a scale-invariant measure of the magnitude. They answer the question: "My result is not likely due to chance (p-value), but how big is it?" This shift from significance to magnitude is the cornerstone of modern statistical interpretation, emphasizing estimation (using confidence intervals) over mere null hypothesis testing.

Core Effect Size Measures for Different Research Designs

The appropriate effect size depends on your research question and data structure. Three of the most fundamental measures are Cohen's d, the correlation coefficient, and the odds ratio.

Cohen's d for Group Differences: When comparing two group means (e.g., treatment vs. control), Cohen's d is the standardized mean difference. It is calculated by taking the difference between the two means and dividing by the pooled standard deviation: $d = \frac{M _{1} - M _{2}}{S D _{p oo l e d}}$ . This creates a metric expressed in standard deviation units. For interpretation, Jacob Cohen offered rough benchmarks: $d = 0.2$ is considered a small effect, $d = 0.5$ medium, and $d = 0.8$ large. Imagine a tutoring program raises test scores by 5 points. If the standard deviation of scores is 25 points, $d = 0.2$ (a small effect). If the standard deviation is 5 points, $d = 1.0$ (a large effect). The same raw difference has a drastically different standardized magnitude based on the variability within the population.

Correlation Coefficients for Relationships: For assessing the strength and direction of a linear relationship between two continuous variables, the Pearson correlation coefficient ( $r$ ) is the go-to effect size. It ranges from -1 to +1, with values closer to the extremes indicating stronger relationships. Cohen suggested $r = \pm 0.1$ is small, $r = \pm 0.3$ is medium, and $r = \pm 0.5$ is large. Crucially, $r^{2}$ (the coefficient of determination) tells you the proportion of variance in one variable explained by the other. An $r = 0.30$ might seem modest, but $r^{2} = 0.09$ indicates the predictor explains only 9% of the variance, highlighting that even a "medium" correlation leaves most of the story unexplained.

Odds Ratios for Binary Outcomes: In studies with categorical outcomes, like clinical trials measuring disease (yes/no), the odds ratio (OR) is a key measure. It compares the odds of an event occurring in one group to the odds in another group. An $OR = 1$ means no difference between groups. An $OR = 2$ means the event is twice as likely (in odds terms) in the first group. For instance, if 20% of a treatment group recovers (odds = 0.25) versus 10% of a control group (odds = 0.11), the OR is $0.25/0.11 = 2.25$ . This indicates the treatment group's odds of recovery are 2.25 times higher. Interpreting ORs requires care, as "doubling the odds" is not the same as doubling the probability, especially for common events.

Precision and Confidence Intervals Around Effect Sizes

An effect size calculated from a sample is a point estimate—your best single guess of the population effect. However, it is surrounded by uncertainty. A confidence interval (CI) around an effect size quantifies this uncertainty, providing a range of plausible values for the true population effect. A 95% CI means that if you repeated the study 100 times, the calculated interval would contain the true population effect size in about 95 of those studies.

Reporting CIs is non-negotiable for good practice. A wide CI indicates low precision (often due to a small sample), while a narrow CI indicates high precision. Crucially, examine where the interval lies in relation to both zero and your benchmark for practical significance. A Cohen's $d = 0.6$ with a 95% CI of $[0.1, 1.1]$ is imprecise (the true effect could be very small or very large) and still includes the possibility of a negligible effect (since it crosses $d = 0.2$ ). This is far more informative than simply reporting $d = 0.6, p < 0.05$ .

Interpreting Practical Significance: From Magnitude to Meaning

Practical significance asks whether the estimated effect size is large enough to be meaningful in the real world. This is a contextual, value-based judgment that statistics alone cannot provide. It requires domain expertise and consideration of costs, benefits, and potential consequences.

Consider a new drug that significantly reduces headache pain with a Cohen's $d = 0.15$ . While statistically different from zero, this is a very small effect. If the drug is inexpensive and has no side effects, it may be practically useful. If it is extremely costly and has severe side effects, the same small effect is not practically significant. Similarly, a correlation of $r = 0.10$ between a personality test and job performance may be statistically significant with a large sample of employees, but it is likely too small to be useful for making hiring decisions. You must interpret the magnitude of the effect size within the specific context of your field, weighing the practical impact against the implementation costs and risks.

Common Pitfalls

1. Confusing Statistical and Practical Significance.

The Mistake: Declaring a finding "important" simply because $p < 0.05$ , or dismissing a non-significant result as "no effect" without examining the effect size and its CI.
The Correction: Always report and interpret the effect size and its confidence interval. A statistically significant result with a trivial effect size is not practically important. Conversely, a non-significant result with a wide CI that includes large effect values indicates the study was inconclusive, not that no effect exists.

2. Misinterpreting Odds Ratios as Risk Ratios.

The Mistake: Stating "the risk doubled" when you have an $OR = 2.0$ . This is only approximately true for rare outcomes. For common outcomes, the actual increase in probability (risk) is much less dramatic.
The Correction: For ORs, be precise with language: "The odds of the outcome were twice as high." When possible and appropriate, convert odds ratios to more intuitively understood metrics like absolute risk reduction or number needed to treat (NNT).

3. Over-Reliance on Generic Benchmarks.

The Mistake: Using Cohen's benchmarks ( $d = 0.2$ , $0.5$ , $0.8$ ) as universal, context-free rules for labeling effects as small, medium, or large.
The Correction: Treat these benchmarks as starting points, not gospel. A $d = 0.5$ may be revolutionary in some fields (e.g., certain areas of physics) and mundane in others (e.g., educational interventions). Ground your interpretation in the existing literature of your specific discipline.

4. Ignoring the Confidence Interval.

The Mistake: Reporting only the point estimate of an effect size (e.g., $r = 0.40$ ), which presents it as an exact known quantity.
The Correction: Always report the confidence interval (e.g., 95% CI $[0.15, 0.60]$ ). This shows the precision of your estimate and allows readers to see the full range of plausible effects, which is essential for scientific cumulation and decision-making.

Summary

Effect sizes (like Cohen's $d$ , $r$ , and $OR$ ) quantify the magnitude of a finding, providing crucial information that a p-value alone cannot.
Always report a confidence interval around your effect size to communicate the precision of your estimate and the range of plausible values for the true population effect.
Practical significance is a contextual judgment about whether an effect is meaningful in the real world; it requires integrating the effect size with domain knowledge, costs, and benefits.
Avoid common interpretive errors, such as treating odds ratios as risk ratios or using generic benchmarks without consideration for your specific field of study.
By prioritizing effect sizes and their confidence intervals, you shift focus from the simplistic question of "Is there an effect?" to the more meaningful questions of "How large is the effect?" and "What does it mean?"

Statistics: Effect Sizes and Practical Significance

Statistics: Effect Sizes and Practical Significance

The Limitation of p-Values and the Necessity of Effect Sizes

Core Effect Size Measures for Different Research Designs

Precision and Confidence Intervals Around Effect Sizes

Interpreting Practical Significance: From Magnitude to Meaning

Common Pitfalls

Summary

Write better notes with AI