AP Statistics: Chi-Square Tests for Independence and Goodness of Fit

Chi-square tests are your essential toolkit for answering questions with categorical data. Whether you’re testing if a die is fair or investigating whether political affiliation is associated with views on a policy, these procedures transform counts and categories into statistically sound conclusions. Mastering them is not just about passing the AP exam—it’s about gaining a powerful way to analyze the world where numbers represent groups, not measurements.

Foundations of Categorical Data Analysis

Before diving into calculations, you must understand the landscape of categorical data. Categorical data places individuals into groups based on qualities, such as gender (male/female/non-binary), survey response (agree/neutral/disagree), or species. Unlike quantitative data, you can’t calculate a mean for these categories; you can only count how many fall into each group. The chi-square tests use these observed counts to test hypotheses about distributions and relationships. The core logic is always to compare what you actually observed in your sample with what you would expect to observe if a specific hypothesis were true. A large discrepancy between observed and expected counts provides evidence against the null hypothesis.

All chi-square tests share a common test statistic, which quantifies the total discrepancy between observed and expected counts. The formula for the chi-square statistic is:

$χ^{2} = \sum \frac{( O - E ) ^{2}}{E}$

where $O$ represents the observed count in a category or table cell, and $E$ represents the corresponding expected count under the null hypothesis. You sum this calculation over all categories or cells. As the differences between observed and expected counts grow, the $χ^{2}$ statistic increases. This statistic follows a chi-square distribution, a family of skewed-right distributions defined by its degrees of freedom (df). The degrees of freedom control the shape of the distribution and are calculated differently for the two main types of tests you’ll encounter.

The Chi-Square Goodness-of-Fit Test

Use the chi-square goodness-of-fit test when you have a single categorical variable from a single population. You test a hypothesis about the distribution of this variable—that is, the proportion of the population that falls into each category. For example, is a six-sided die fair? Does the distribution of blood types in a hospital match the known national proportions?

The procedure follows a clear workflow. First, state your hypotheses. The null hypothesis ( $H_{0}$ ) specifies the claimed or hypothesized distribution of proportions across the categories. The alternative hypothesis ( $H_{a}$ ) simply states that the distribution is different from the one specified in $H_{0}$ . Next, calculate expected counts. If your null hypothesis states that proportion $p_{i}$ of the population is in category $i$ , and you have a total sample size of $n$ , then the expected count for that category is $E_{i} = n \times p_{i}$ .

The degrees of freedom for this test is the number of categories minus one: $df = k - 1$ . This makes intuitive sense: once you know the counts for $k - 1$ categories and the total sample size, the count for the last category is fixed. After calculating the chi-square statistic using the formula, you use the chi-square distribution with the correct $df$ to find a p-value—the probability of getting a $χ^{2}$ statistic as extreme as or more extreme than the one from your sample, assuming $H_{0}$ is true. A small p-value provides evidence against the null hypothesis.

Worked Example: A candy company claims its mix is 20% blue, 20% orange, 20% green, 20% yellow, and 20% red. You take a random sample of 100 candies and get counts of 18, 19, 22, 21, and 20 respectively.

$H_{0}$ : The distribution of colors is as claimed (20% each).
$H_{a}$ : The distribution of colors is different from the claim.
Expected counts: For each color, $E = 100 \times 0.20 = 20$ .
$df = 5 - 1 = 4$ .
Calculate $χ^{2} = \frac{( 18 - 20 ) ^{2}}{20} + \frac{( 19 - 20 ) ^{2}}{20} + \frac{( 22 - 20 ) ^{2}}{20} + \frac{( 21 - 20 ) ^{2}}{20} + \frac{( 20 - 20 ) ^{2}}{20} = 0.2 + 0.05 + 0.2 + 0.05 + 0 = 0.5$ .
A $χ^{2}$ value of 0.5 with 4 df yields a very large p-value (approximately 0.97), providing no evidence against the company's claim.

The Chi-Square Test for Independence

The chi-square test for independence is used when you have two categorical variables measured on the same individuals. You want to know if there is an association or relationship between the variables. For instance, is handedness (left/right) independent of gender? Is movie preference (comedy/drama/action) associated with age group?

Here, your data is organized in a two-way table (contingency table). The null hypothesis ( $H_{0}$ ) states that there is no association between the two variables; they are independent. The alternative ( $H_{a}$ ) states that an association does exist. The calculation of expected counts changes. Under the assumption of independence, the expected count for a cell in row $i$ and column $j$ is:

$E_{ij} = \frac{( Row i Total ) \times ( Column j Total )}{Grand Total}$

This formula applies the multiplication rule for independent events: the probability of being in a cell is the probability of being in that row times the probability of being in that column.

The degrees of freedom for a test of independence is $df = (r - 1) (c - 1)$ , where $r$ is the number of rows and $c$ is the number of columns. After calculating the $χ^{2}$ statistic by summing over all $r \times c$ cells, you find the p-value using the appropriate chi-square distribution.

Worked Example: A survey asks 300 students about their pet preference (Dog/Cat) and their class year (Freshman/Sophomore). You observe the following:

Dog	Cat	Row Total
Freshman	80	40	120
Sophomore	70	110	180
Col Total	150	150	300

$H_{0}$ : Pet preference and class year are independent.
$H_{a}$ : An association exists.
Calculate expected for Freshman/Dog cell: $E = (120 \times 150) /300 = 60$ .
Similarly: Freshman/Cat = 60, Sophomore/Dog = 90, Sophomore/Cat = 90.
$df = (2 - 1) (2 - 1) = 1$ .
Calculate $χ^{2} = \frac{( 80 - 60 ) ^{2}}{60} + \frac{( 40 - 60 ) ^{2}}{60} + \frac{( 70 - 90 ) ^{2}}{90} + \frac{( 110 - 90 ) ^{2}}{90} \approx 6.67 + 6.67 + 4.44 + 4.44 = 22.22$ .
This large $χ^{2}$ with 1 df gives a very small p-value, providing strong evidence of an association between class year and pet preference.

Common Pitfalls

Violating the Large Counts Condition: The most common mistake is failing to check that all expected counts are at least 5. This condition is necessary for the chi-square distribution to be a good approximation for the sampling distribution of the $χ^{2}$ statistic. If you have a cell with an expected count of 3, the test results are not reliable. On the AP exam, stating this condition is crucial for earning full credit on a free-response question. The condition applies to expected counts, not observed counts.

Misinterpreting a Significant Result: Finding a small p-value and rejecting the null hypothesis in a test for independence only tells you that an association exists. It does not tell you the nature or strength of that association. You cannot conclude that one variable "causes" the other. To understand the pattern, you must go back to the two-way table and calculate conditional distributions or examine the cells that contributed most to the large $χ^{2}$ statistic.

Using the Wrong Degrees of Freedom: Confusing the formulas for degrees of freedom between the goodness-of-fit test ( $df = k - 1$ ) and the test for independence ( $df = (r - 1) (c - 1)$ ) is a costly error. Always identify which test you are performing first. For a single categorical variable, use $k - 1$ . For a two-way table analyzing two variables, use $(r - 1) (c - 1)$ .

Stating a Conclusion Out of Context: The final step is to state your conclusion in the context of the problem. For a test of independence, avoid generic phrases like "reject the null." Instead, say, "There is convincing statistical evidence that an association exists between class year and pet preference." For a goodness-of-fit test, say, "We do not have convincing evidence that the distribution of candy colors differs from the company's claim." Omitting this context will lose points on the exam.

Summary

Chi-square tests analyze categorical data. The goodness-of-fit test examines a single categorical variable against a hypothesized distribution, while the test for independence examines the association between two categorical variables.
The core calculation compares observed and expected counts using the formula $χ^{2} = \sum \frac{( O - E ) ^{2}}{E}$ . A large $χ^{2}$ statistic relative to its degrees of freedom leads to a small p-value.
Degrees of freedom differ by test: $df = k - 1$ for goodness-of-fit (with $k$ categories) and $df = (r - 1) (c - 1)$ for tests of independence (for an $r \times c$ table).
Always verify the Large Counts Condition: All expected counts must be at least 5 for the test to be valid.
Interpretation is key: A significant test for independence indicates an association, not causation. Always state your final conclusion in the context of the original problem.
On the AP exam, clearly outline your steps: state hypotheses, check conditions, calculate, find the p-value, and conclude in context. Showing your work for expected counts is often required.

AP Statistics: Chi-Square Tests for Independence and Goodness of Fit

AP Statistics: Chi-Square Tests for Independence and Goodness of Fit

Foundations of Categorical Data Analysis

The Chi-Square Goodness-of-Fit Test

The Chi-Square Test for Independence

Common Pitfalls

Summary

Write better notes with AI