AP Statistics: Chi-Square Test for Goodness of Fit

The Chi-Square Test for Goodness of Fit is a fundamental tool for making data-driven decisions when your question is categorical. Whether you're testing if a die is fair, if customer preferences match market predictions, or if a genetic cross follows Mendelian ratios, this test provides a rigorous, statistical framework to evaluate how well your observed data align with a hypothesized model. Mastering it is essential for the AP exam and forms a cornerstone for more advanced statistical analysis in engineering, social sciences, and biology.

Understanding the Hypotheses and Conditions

Every statistical test begins with a clear statement of what you are investigating. For a Chi-Square Test for Goodness of Fit, you are testing a hypothesis about the distribution of proportions across one categorical variable.

You start by stating your hypotheses. The null hypothesis ( $H_{0}$ ) specifies the hypothesized proportion for each category. For example, for a fair six-sided die, $H_{0}$ would state that the proportion for each face is $1/6$ . The alternative hypothesis ( $H_{a}$ ) is that at least one of the hypothesized proportions is incorrect; the observed distribution does not fit the proposed model.

However, you cannot perform the test unless your data meet certain conditions. These are non-negotiable and must be checked:

Random: The data must come from a random sample or a randomized experiment.
Large Counts: All expected counts must be at least 5. The expected count for each category is calculated as $n * p_{i}$ , where $n$ is the total sample size and $p_{i}$ is the hypothesized proportion for that category.
Independent: Individual observations are independent. This is generally satisfied if the sample size is less than 10% of the population when sampling without replacement.

If these conditions are met, the sampling distribution of the test statistic can be approximated by a Chi-Square distribution, allowing you to calculate a valid p-value.

Calculating the Chi-Square Statistic and Degrees of Freedom

The core of the test is calculating a single number that summarizes how much your observed data deviate from what was expected. This is the chi-square statistic, denoted $χ^{2}$ .

The formula is: $χ^{2} = \sum \frac{( Observed - Expected ) ^{2}}{Expected}$

You calculate this for every category. Here’s a step-by-step breakdown using a simple example: Suppose you flip a coin 100 times, hypothesizing it is fair ( $H_{0} : p_{Heads} = 0.5, p_{Tails} = 0.5$ ). You observe 55 heads and 45 tails.

Observed (O): Heads = 55, Tails = 45.
Expected (E): For a fair coin, you expect 50 heads and 50 tails (100 * 0.5).
Calculate for Heads: $(55 - 50)^{2} /50 = 25/50 = 0.5$
Calculate for Tails: $(45 - 50)^{2} /50 = 25/50 = 0.5$
Chi-Square Statistic: $χ^{2} = 0.5 + 0.5 = 1.0$

This statistic follows a Chi-Square distribution. The specific shape of this distribution is determined by the degrees of freedom (df). For a goodness of fit test, the degrees of freedom is the number of categories minus one. In our coin example, there are 2 categories (Heads, Tails), so $df = 1$ . If you were testing a fair die with 6 categories, $df$ would be 5. The degrees of freedom are crucial for finding the p-value.

Finding the P-Value and Making a Conclusion

The p-value is the probability, assuming the null hypothesis is true, of obtaining a chi-square statistic as extreme as or more extreme than the one calculated from your sample data. A small p-value provides evidence against $H_{0}$ , suggesting the observed data do not fit the hypothesized distribution.

To find the p-value, you use the chi-square statistic and the degrees of freedom. On the AP exam, you will use the chi-square table provided on the formula sheet. Locate the row corresponding to your $df$ . Then find where your calculated $χ^{2}$ value falls among the critical values in that row. The p-value is roughly the area to the right of your statistic. For our coin example with $χ^{2} = 1.0$ and $df = 1$ , the table shows a critical value of 1.323 for a 0.25 significance level. Our statistic is smaller, so the p-value is greater than 0.25. In practice, you would use calculator software (e.g., χ2cdf(1.0, 999, 1)) to get a precise p-value of about 0.317.

Your conclusion has two interconnected parts:

Contextual: In the context of the problem, state whether you reject or fail to reject $H_{0}$ .
Evidential: Provide statistical justification based on the p-value and a pre-determined significance level ( $α$ ), often 0.05.

For the coin, with a p-value of ~0.317 which is much greater than $α = 0.05$ , you *fail to reject $H_{0}$ *. You do not have sufficient statistical evidence to conclude the coin is unfair. It is vital to never "accept" the null or claim the data "prove" the model is correct; you can only state that the data are not inconsistent with the model.

Common Pitfalls

Misapplying the Test to Non-Categorical Data: The goodness of fit test is only for counts in categories. You cannot use it to test if a distribution of quantitative data (like heights) is normal—that requires a different procedure. Confusing these is a classic exam trap.

Ignoring the Large Counts Condition: This is the most frequently overlooked step. If any expected count is below 5, the chi-square approximation is not valid, and your p-value will be unreliable. The solution is to combine categories if it makes logical sense (e.g., combining "Strongly Disagree" and "Disagree") or to collect more data.

Misstating the Conclusion in the Alternative: The alternative hypothesis ( $H_{a}$ ) for this test is always "at least one proportion is different." It does not specify which one or how. A significant result tells you the model doesn't fit, but you must then look back at the component calculations $((O - E)^{2} / E)$ to see which categories contributed most to the large $χ^{2}$ value to understand the nature of the discrepancy.

Confusing Goodness of Fit with Other Chi-Square Tests: The Chi-Square Test for Goodness of Fit analyzes the distribution of one categorical variable. Do not confuse it with the Chi-Square Test for Homogeneity or the Chi-Square Test for Independence, which both analyze the relationship between two categorical variables. Knowing which test to use is half the battle.

Summary

The Chi-Square Test for Goodness of Fit formally tests whether the observed distribution of a single categorical variable matches a hypothesized distribution defined in the null hypothesis ( $H_{0}$ ).
Valid inference requires checking the Random, Large Counts (all expected counts ≥ 5), and Independent conditions. The test statistic $χ^{2} = \sum \frac{( O - E ) ^{2}}{E}$ measures total standardized squared deviation from expected counts.
The degrees of freedom is calculated as (number of categories - 1) and, together with the $χ^{2}$ statistic, is used to find the p-value from a chi-square distribution.
A small p-value (typically compared to $α = 0.05$ ) provides evidence against $H_{0}$ , leading you to conclude the observed data do not fit the hypothesized model. Always state your conclusion in context and justify it with the p-value.
Avoid common errors by ensuring your data are categorical counts, rigorously checking the Large Counts condition, and correctly identifying this as a one-variable test distinct from other chi-square procedures.

AP Statistics: Chi-Square Test for Goodness of Fit

AP Statistics: Chi-Square Test for Goodness of Fit

Understanding the Hypotheses and Conditions

Calculating the Chi-Square Statistic and Degrees of Freedom

Finding the P-Value and Making a Conclusion

Common Pitfalls

Summary

Write better notes with AI