AP Statistics: Confidence Intervals for Means

When you only have sample data, how can you make a reliable guess about an entire population's average? Confidence intervals for a population mean provide the answer, offering a principled way to quantify uncertainty in your estimate. Mastering this technique is essential for drawing valid conclusions from data in fields from engineering to public health, moving beyond a single sample mean to a statement about where the true parameter likely lies.

The Logic of Estimation: From Point to Interval

A sample mean ( $\overset{x}{ˉ}$ ) is a point estimate—a single best guess for the population mean ( $μ$ ). However, a different sample would yield a different $\overset{x}{ˉ}$ . A confidence interval accounts for this sampling variability by creating a range (an interval) of plausible values for $μ$ . The "confidence" level, typically 90%, 95%, or 99%, expresses the long-run success rate of the method: if you were to take many samples and build an interval from each, that percentage of intervals would capture the true $μ$ . It is incorrect to say "there is a 95% probability the population mean is in my interval"; the probability statement is about the method, not any single, fixed interval.

The t-Distribution: The Key When Sigma is Unknown

To build the interval, you need a sampling distribution. If you knew the population standard deviation ( $σ$ ), you would use the Normal distribution via the z-score. In reality, $σ$ is almost always unknown. You must estimate it using the sample standard deviation (s). Substituting s for $σ$ adds extra variability. The t-distribution accounts for this. It is similar in shape to the Normal distribution—symmetric and bell-shaped—but has thicker tails. The thickness depends on degrees of freedom (df), calculated as $df = n - 1$ for one mean. With more data (higher df), the t-distribution converges to the Normal.

Conditions for Valid Use

Constructing a valid t-interval requires checking three key conditions:

Random: The data must come from a random sample or a randomized experiment. This ensures the sample is representative.
Independent: Individual observations are independent. This is generally satisfied if sampling is random and the sample size is less than 10% of the population (the 10% Condition).
Normal/Large Sample: The sampling distribution of $\overset{x}{ˉ}$ must be approximately Normal. This can be satisfied if:

The population distribution is Normal (rarely known).
The sample data appear roughly symmetric and unimodal without strong outliers (check a graph).
The sample size is large ( $n \geq 30$ ), invoking the Central Limit Theorem, which states the sampling distribution becomes approximately Normal regardless of the population shape.

Constructing the Interval: A Step-by-Step Process

The formula for a confidence interval for a population mean is:

$statistic \pm (critical value) \times (standard error of the statistic)$

Which translates to:

$\overset{x}{ˉ} \pm t^{*} \times (\frac{s}{n})$

Where:

$\overset{x}{ˉ}$ is the sample mean.
$t^{*}$ (t-star) is the critical value from the t-distribution with $n - 1$ degrees of freedom for your chosen confidence level.
$\frac{s}{n}$ is the standard error (SE) of the sample mean, estimating the variability of $\overset{x}{ˉ}$ from sample to sample.

Example: An engineer measures the tensile strength of 16 randomly selected alloy strips. The sample mean is 242.6 MPa, and the sample standard deviation is 9.5 MPa. Construct a 95% CI for the true mean tensile strength.

Check Conditions: Random sample stated. Independence is reasonable if 16 < 10% of all strips. Sample size (16) < 30, so we must check normality of the sample data (assume a dotplot shows no strong skew or outliers). Conditions are met.
Calculate Components:

$\overset{x}{ˉ} = 242.6$ , $s = 9.5$ , $n = 16$ .
$df = 16 - 1 = 15$ .
For a 95% CI with $df = 15$ , $t^{*} = 2.131$ (from a t-table or calculator).
$SE = \frac{s}{n} = \frac{9.5}{16} = 2.375$ .

Compute the Interval:

Margin of Error (ME): $t^{*} \times SE = 2.131 \times 2.375 \approx 5.06$ .
CI: $242.6 \pm 5.06 = (237.54, 247.66)$ .

Interpretation in Context

The correct interpretation has four parts: confidence level, parameter, interval, and context. For our example: "We are 95% confident that the interval from 237.54 to 247.66 megapascals captures the true mean tensile strength of all alloy strips from this process."

Notice what this does not say: It does not say 95% of the sample data or 95% of all strips are in this range. The interval is about the population mean.

Common Pitfalls

Misinterpreting the Confidence Level: Stating "There is a 95% chance the mean is between..." is a common mistake. The population mean is fixed; your interval either contains it or it doesn't. The 95% refers to the reliability of the interval-building process.

Ignoring Conditions: Applying the t-interval formula to data from a voluntary response survey (failing randomness) or to heavily skewed small-sample data (failing normality) invalidates the result. Always check conditions first.

Confusing Standard Deviation ( $s$ ) and Standard Error ( $s / n$ ): The margin of error uses the standard error, which scales the sample variability by the sample size. Using $s$ alone in the formula creates an incorrectly wide interval.

Forgetting the t-Distribution: Using a z* (Normal) critical value when sigma is unknown, especially for small samples, yields an interval that is too narrow and overstates your precision. Always use $t^{*}$ when estimating a mean with $s$ .

Summary

Confidence intervals for a mean estimate a population parameter ( $μ$ ) with a stated level of confidence, providing a range of plausible values that accounts for sampling error.
The t-distribution, specified by degrees of freedom ( $df = n - 1$ ), is used instead of the Normal distribution when the population standard deviation is unknown and estimated by the sample standard deviation ( $s$ ).
Valid construction requires verifying Random, Independent, and Normal/Large Sample conditions; the interval is $\overset{x}{ˉ} \pm t^{*} (s / n)$ .
The correct interpretation is, "We are C% confident that the interval from [lower bound] to [upper bound] captures the true mean of [context]."
The margin of error ( $t^{*} \times SE$ ) decreases with larger sample sizes, leading to more precise intervals, and increases with higher confidence levels, leading to wider, more conservative intervals.

AP Statistics: Confidence Intervals for Means

AP Statistics: Confidence Intervals for Means

The Logic of Estimation: From Point to Interval

The t-Distribution: The Key When Sigma is Unknown

Conditions for Valid Use

Constructing the Interval: A Step-by-Step Process

Interpretation in Context

Common Pitfalls

Summary

Write better notes with AI