Effect Size Measures and Practical Significance

While a p-value tells you whether an observed effect is likely due to chance, it reveals nothing about the importance or magnitude of that effect. This is where effect size measures become essential. They are standardized metrics that quantify the strength or magnitude of a relationship or difference, moving your analysis beyond the simplistic question of "is there an effect?" to the more meaningful "how large is the effect?" Understanding and reporting effect sizes is fundamental for conducting meaningful research, performing meta-analyses (which statistically combine results from multiple studies), and making data-informed decisions in business, science, and policy. This guide will equip you to calculate, interpret, and communicate the most common effect sizes, ensuring your statistical conclusions are both rigorous and relevant.

Correlation Coefficients: Measuring Relationship Strength

One of the most intuitive effect size measures is the correlation coefficient, most commonly Pearson's $r$ . It quantifies the strength and direction of a linear relationship between two continuous variables. The value of $r$ ranges from -1 to +1. A value of +1 indicates a perfect positive linear relationship, -1 a perfect negative linear relationship, and 0 indicates no linear relationship.

The coefficient is calculated as the covariance of the two variables divided by the product of their standard deviations. For a sample, the formula is:

$r_{x y} = \frac{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ( y _{i} - y ˉ )}{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2} \sum _{i = 1}^{n} ( y _{i} - y ˉ ) ^{2}}$

While the sign indicates direction, the absolute value indicates strength. For interpretation, Jacob Cohen's benchmarks are widely cited: $r = .10$ is considered a small effect, $r = .30$ a medium effect, and $r = .50$ a large effect. Imagine a study finds a correlation of $r = .35$ between hours of practice and musical performance score. This is a medium-sized effect, suggesting practice is meaningfully related to performance, but other factors also play a substantial role. It's crucial to also report a confidence interval for the effect size (e.g., 95% CI [.20, .48]), which communicates the precision of your estimate and shows the range of plausible true effect sizes.

Cohen's d: Standardized Mean Differences

When comparing the means of two groups (e.g., a treatment vs. control group), Cohen's d is the go-to effect size. It represents the difference between two means in terms of standard deviation units. The formula for the most common version is:

$d = \frac{X ˉ _{1} - X ˉ _{2}}{s _{p oo l e d}}$

where $s_{p oo l e d}$ is the pooled standard deviation of both groups. This standardization allows you to compare effects across studies that used different measurement scales. For instance, a $d$ of 0.5 means the group means differ by half a standard deviation.

Cohen's benchmarks are: $d = 0.2$ (small), $d = 0.5$ (medium), and $d = 0.8$ (large). Consider a clinical trial testing a new drug on blood pressure. A statistically significant result with $p < .05$ and $d = 0.3$ indicates a real but small average reduction. However, if the same $p$ -value came with $d = 1.2$ , it would signal a very large, clinically important reduction. This distinction is the heart of practical significance—the real-world importance of the finding. A new teaching method might yield a statistically significant improvement in test scores ( $p = .01$ ), but if $d = 0.15$ , the actual improvement is tiny and may not justify the cost of implementation.

Eta-Squared and Related Measures: Effects in ANOVA

When your analysis involves comparing more than two group means using Analysis of Variance (ANOVA), you need an effect size that accounts for variance explained. Eta-squared ( $η^{2}$ ) is a common measure representing the proportion of total variance in the dependent variable that is attributable to a factor.

It is calculated as:

$η^{2} = \frac{S S _{e ff ec t}}{S S _{t o t a l}}$

where $S S_{e ff ec t}$ is the sum of squares for the factor you're examining and $S S_{t o t a l}$ is the total sum of squares. Its value ranges from 0 to 1. For interpretation, $η^{2} = 0.01$ is considered a small effect, $η^{2} = 0.06$ a medium effect, and $η^{2} = 0.14$ a large effect. In a one-way ANOVA studying the effect of fertilizer type (A, B, C) on plant growth, an $η^{2} = 0.10$ would indicate that 10% of the total variation in plant height is explained by the type of fertilizer used, a moderate effect.

A related and often preferred measure is partial eta-squared ( $η_{p}^{2}$ ), used in factorial ANOVA (with multiple factors), which isolates the variance explained by one factor while controlling for others. Reporting the confidence interval for these measures is just as critical as for correlations or Cohen's $d$ .

Odds Ratios: Effect Sizes for Binary Outcomes

For categorical outcomes, especially binary outcomes (e.g., disease/no disease, pass/fail), the odds ratio (OR) is a fundamental effect size. It compares the odds of an event occurring in one group to the odds of it occurring in another group. The odds of an event is the probability of the event divided by the probability of the non-event: $o dd s = p / (1 - p)$ .

If you have a 2x2 contingency table, the odds ratio is calculated as:

$OR = \frac{a / b}{c / d} = \frac{a d}{b c}$

where $a$ and $b$ are the event and non-event counts in Group 1, and $c$ and $d$ are the corresponding counts in Group 2. An $OR = 1$ means the odds are equal in both groups (no effect). An $OR > 1$ means the odds are higher in the first group, and an $OR < 1$ means the odds are lower.

Interpretation is different from previous measures. For example, in studying a new drug, an $OR$ of 2.5 for "recovery" means the odds of recovery for the treatment group are 2.5 times the odds for the control group. There are no universal benchmarks for small/medium/large ORs, as context is king. An $OR$ of 1.2 for a major health outcome could be hugely important for public policy, while an $OR$ of 3.0 for a trivial outcome might be irrelevant. Therefore, communicating practical significance here involves contextualizing the odds ratio in terms of base rates and real-world impact.

Common Pitfalls

Confusing Statistical Significance with Practical Importance: The most dangerous pitfall is declaring a finding "significant" and stopping the analysis. A result can be statistically significant (low p-value) yet have a minuscule effect size ( $d$ = 0.1, $η^{2}$ = 0.01) that is meaningless in practice. Always report and interpret the effect size.
Misapplying Benchmark Guidelines: Cohen's benchmarks (small, medium, large) are useful heuristics, especially in novel research areas. However, they are not universal rules. An effect considered "small" in physics might be "huge" in social psychology. You must interpret effect sizes within the context of your specific field, prior research, and the potential consequences of the finding.
Reporting Effect Sizes Without Confidence Intervals: An effect size is a point estimate from your sample data. Without a confidence interval, you cannot assess its precision or stability. A large effect size with a very wide CI (e.g., $d$ = 0.8, 95% CI [-0.1, 1.7]) is an unreliable finding. The CI tells you the range within which the true population effect size likely lies.
Using the Wrong Effect Size for the Design: Using Cohen's $d$ for a correlation, or a correlation for an ANOVA, misrepresents your analysis. Match the effect size to your research design: correlations for associations, Cohen's $d$ for two-group mean comparisons, $η^{2}$ for ANOVA, and odds ratios for binary outcomes.

Summary

Effect size measures, such as correlation $r$ , Cohen's $d$ , eta-squared $η^{2}$ , and odds ratios ( $OR$ ), quantify the magnitude of a relationship or difference, answering "how much?" rather than just "is it there?"
Always report confidence intervals for effect sizes to convey the precision and reliability of your estimate, moving beyond a single point value.
While heuristic benchmarks (small, medium, large) exist, the final interpretation must be grounded in practical significance—the real-world importance, cost, and applicability of the finding within your specific domain.
Statistical significance (p-value) and practical significance (effect size) are complementary concepts. A result must be evaluated on both dimensions for truly informed decisions in research, business, and policy.

Effect Size Measures and Practical Significance

Effect Size Measures and Practical Significance

Correlation Coefficients: Measuring Relationship Strength

Cohen's d: Standardized Mean Differences

Eta-Squared and Related Measures: Effects in ANOVA

Odds Ratios: Effect Sizes for Binary Outcomes

Common Pitfalls

Summary

Write better notes with AI