AP Statistics: Paired t-Tests

When you want to compare two related measurements—like a patient's blood pressure before and after a medication, or the fuel efficiency of the same car using two different fuels—you need a statistical test that respects that inherent connection. The paired t-test is precisely that tool. It moves beyond comparing two separate groups to focus on the change or difference within naturally linked pairs, providing a more precise and powerful analysis for dependent data. Mastering this procedure is crucial for the AP Statistics exam and forms a foundational skill for interpreting real-world experiments in fields from medicine to engineering.

The Core Intuition: Why Pairing Matters

Imagine testing two different teaching methods. If you randomly assign one class to Method A and a different class to Method B, you are comparing independent samples. Differences in final scores could be due to the teaching method or to pre-existing differences between the two groups of students. Now, imagine using both methods on the same class, with each student experiencing both in a random order. Here, each student serves as their own control. You are no longer comparing Class A to Class B; you are comparing Method A to Method B within each student. This is the essence of a paired design.

The statistical power comes from focusing on the difference for each pair. By analyzing these differences, you effectively "cancel out" the subject-to-subject variability that clouds comparisons between two independent groups. For the AP exam, the first step is always to recognize the paired structure: look for key phrases like "before and after," "matched pairs," "twins," or "repeated measures" on the same subjects.

The Step-by-Step Procedure

Once you've identified paired data, the procedure converts a two-sample problem into a one-sample problem on the differences. Here is the workflow, best followed with a concrete example.

Scenario: An engineer tests a new alloy coating designed to reduce wear on machine gears. Ten gears are selected. Each gear's wear (in micrometers) is measured after a standard test, first without the coating and then with the coating. This creates ten paired measurements (Without, With).

Calculate the Differences: For each pair (gear), compute the difference. It is critical to define the direction consistently (e.g., $d = Wear_{Without} - Wear_{With}$ ). A positive $d$ would indicate reduced wear with the coating.
Compute Mean and Standard Deviation: Calculate the mean difference ( $\overset{x}{ˉ}_{d}$ ) and the standard deviation of the differences ( $s_{d}$ ). These are your key summary statistics.
State Hypotheses: The null hypothesis always states that the true mean difference ( $μ_{d}$ ) is zero, implying no average effect.

$H_{0} : μ_{d} = 0$
$H_{a} : μ_{d} > 0$ (one-sided, if testing for a reduction) or $H_{a} : μ_{d} \neq = 0$ (two-sided, if testing for any change)

Calculate the Test Statistic: The paired t-test statistic is calculated as if the differences are a single sample:

$t = \frac{x ˉ _{d} - 0}{s _{d} / n}$ where $n$ is the number of pairs. Note the denominator uses $s_{d} / n$ , the standard error of the mean difference.

Find the P-value: Use the t-distribution with $df = n - 1$ degrees of freedom to find the P-value corresponding to the calculated $t$ -statistic.
Conclusion in Context: Compare the P-value to your significance level (e.g., $α = 0.05$ ). Reject or fail to reject $H_{0}$ , and state what this means regarding the original question (e.g., "We have convincing statistical evidence that the new alloy coating reduces mean gear wear.").

Verifying the Necessary Conditions

No inference procedure is valid without checking its underlying assumptions. For the paired t-test, the conditions apply to the differences, not the original paired measurements.

Paired Data: The design must involve natural pairs, matched pairs, or repeated measures from the same unit. This is the fundamental condition you verify first.
Randomness: The data must come from a random sample or a randomized experiment. For our gear example, the ten gears should be a random sample from the production line, and the order of testing (coating vs. no coating) should be randomized for each gear to avoid bias.
Normality: The sampling distribution of the sample mean difference ( $\overset{x}{ˉ}_{d}$ ) should be approximately normal. This can be satisfied if:

The population of differences is Normally distributed (check with a graph of the differences like a histogram or Normal probability plot), OR
The sample size of pairs is large ( $n \geq 30$ ) by the Central Limit Theorem.

For small samples, a graph of the differences should show no strong skewness or outliers.

Independence of Differences: Individual differences should be independent of each other. This is often justified by the random sampling or randomization in the design.

Interpretation and Connection to Confidence Intervals

A statistically significant paired t-test tells you an effect exists. To understand the size of that effect, you must construct a confidence interval for the mean difference $μ_{d}$ . The formula is: $\overset{x}{ˉ}_{d} \pm t^{*} (\frac{s _{d}}{n})$ where $t^{*}$ is the critical value from the t-distribution with $n - 1$ df for your desired confidence level.

Interpreting this interval is a key exam skill. For a 95% CI of (1.5, 4.2) micrometers in our gear example, you would say: "We are 95% confident that the true mean reduction in wear due to the coating is between 1.5 and 4.2 micrometers." Notice the connection to the test: if this entire interval is above zero (all positive values), it aligns with a significant one-sided test for a positive mean difference. If the interval contains zero, you would not reject a null hypothesis of no difference.

Common Pitfalls

Using the Wrong Test Formula: The most frequent error is applying the formula for a two-sample t-test for independent means to paired data. This mistake ignores the pairing, increases standard error, and drastically reduces the power to detect a real effect. Always ask: "Are the data paired?" If yes, you must work with the differences.
Misapplying the Normality Condition: Students often check normality for the two original samples instead of the single set of differences. The assumption is about the distribution of the differences, not the individual measurements. A graph of the two original samples can look very non-Normal, but the differences can be perfectly suitable for a t-test.
Inconsistent Direction in Differences: If you define $d = Before - After$ for some pairs and $d = After - Before$ for others, your mean difference will be meaningless. Choose one consistent direction for all pairs and note it in your hypotheses.
Confusing "Matched Pairs" with "Independent Groups": Just because two samples have the same number of observations does not mean they are paired. Pairing requires a specific, pre-existing link between each observation in Sample A and a particular observation in Sample B. Without that explicit link, the data are independent.

Summary

The paired t-test is used for dependent data from matched pairs, natural pairs, or repeated measures on the same subjects. Its power comes from analyzing the difference within each pair, which controls for variability between subjects.
The procedure converts a two-sample problem into a one-sample t-test on the differences. The test statistic is $t = \overset{x}{ˉ}_{d} / (s_{d} / n)$ with $df = n - 1$ .
Critical conditions must be verified: the data must be paired, the differences should come from a random sample/experiment and an approximately Normal distribution (especially for small n), and the differences must be independent.
Always supplement a significant test with a confidence interval for the mean difference to estimate the size of the effect in context.
The most common mistake is using an independent two-sample t-test on paired data, which invalidates the analysis. Always identify the data structure first.

AP Statistics: Paired t-Tests

AP Statistics: Paired t-Tests

The Core Intuition: Why Pairing Matters

The Step-by-Step Procedure

Verifying the Necessary Conditions

Interpretation and Connection to Confidence Intervals

Common Pitfalls

Summary

Write better notes with AI