IB Math AI: Statistical Tests and Analysis

Statistical testing transforms raw data into credible evidence, forming the backbone of empirical research in fields from economics to medicine. For IB Math Applications and Interpretation HL, mastering these tests is not just an exam requirement but a critical skill for making informed, data-driven decisions in an increasingly quantitative world. You will learn to move beyond descriptive statistics and use inferential methods to test claims, identify patterns, and draw conclusions about populations from samples.

The Logic of Hypothesis Testing: From Question to Decision

Every statistical test begins with a structured investigative process known as hypothesis testing. This formal procedure allows you to use sample data to evaluate a claim about a population. The claim you test against is called the null hypothesis ( $H_{0}$ ), which typically states that there is no effect, no difference, or no association. For instance, $H_{0}$ might claim that the mean height of a plant species is 15 cm, or that two variables are independent. The alternative hypothesis ( $H_{1}$ or $H_{a}$ ) represents what you seek evidence for; it is the opposite of the null, stating that an effect, difference, or association exists.

The strength of evidence against the null hypothesis is measured by the p-value. Technically, the p-value is the probability of obtaining your observed sample results, or something more extreme, assuming the null hypothesis is true. A small p-value indicates that your observed data would be very unlikely under the null hypothesis. You compare this p-value to a pre-determined significance level ( $α$ ), which is the threshold for rejecting $H_{0}$ . Common choices for $α$ are 0.05 (5%) or 0.01 (1%). If the p-value $\leq α$ , you reject the null hypothesis in favor of the alternative. If the p-value $> α$ , you fail to reject the null—this does not prove $H_{0}$ is true, but rather that the data does not provide strong enough evidence against it.

Comparing Means: The t-Test

When your research question involves comparing averages, the t-test is a fundamental tool. It is used to test hypotheses about population means. The most common types you will encounter are the one-sample t-test and the two-sample t-test. The one-sample t-test assesses whether the mean of a single group differs from a hypothesized value. For example, you might test if the average reaction time of drivers using a new hands-free device is different from the established safe standard of 1.2 seconds.

The two-sample t-test compares the means of two independent groups, such as testing if the average test score for students who received a new tutoring method ( $μ_{1}$ ) is greater than the average for those who did not ( $μ_{2}$ ). Here, $H_{0} : μ_{1} - μ_{2} = 0$ and $H_{1} : μ_{1} - μ_{2} > 0$ for a one-tailed test. The test statistic for a two-sample t-test is calculated as:

$t = \frac{x ˉ _{1} - x ˉ _{2}}{\frac{s _{1}^{2}}{n _{1}} + \frac{s _{2}^{2}}{n _{2}}}$

where $\overset{x}{ˉ}$ represents sample means, $s^{2}$ represents sample variances, and $n$ represents sample sizes. You then compare the calculated $t$ -value to a critical value from the t-distribution (with degrees of freedom determined by the test type and sample sizes) or, more commonly in IB, use your calculator or software to find the associated p-value directly. The decision rule based on the p-value and $α$ applies exactly as in the general hypothesis testing logic.

Assessing Distributions and Associations: Chi-Squared Tests

While t-tests deal with means, chi-squared tests ( $χ^{2}$ tests) are used for categorical data. There are two primary types: the chi-squared test for goodness of fit and the chi-squared test for independence.

The chi-squared test for goodness of fit determines how well a sample distribution of a single categorical variable matches a hypothesized population distribution. Imagine a candy company claims its bag contains 30% red, 30% yellow, 20% green, and 20% blue candies. You can buy a bag, count the candies of each color, and use a goodness of fit test to see if your sample proportions are consistent with the company's claim. The null hypothesis states that the observed frequencies follow the claimed distribution.

The chi-squared test for independence examines whether two categorical variables are associated in a population. For instance, you might want to know if smartphone brand preference (e.g., Brand A, B, C) is independent of age group (e.g., Teen, Adult, Senior) based on a survey sample. The null hypothesis ( $H_{0}$ ) is that the two variables are independent. Both tests use the same core calculation for the test statistic:

$χ^{2} = \sum \frac{( O _{i} - E _{i} ) ^{2}}{E _{i}}$

where $O_{i}$ represents the observed frequency in each category or cell of a contingency table, and $E_{i}$ represents the expected frequency calculated under the assumption that $H_{0}$ is true. For independence, $E_{i} = \frac{( row total ) \times ( column total )}{grand total}$ for each cell. The calculated $χ^{2}$ value is then compared to the chi-squared distribution with the appropriate degrees of freedom to obtain a p-value. A large $χ^{2}$ value, leading to a small p-value, provides evidence against the null hypothesis, suggesting a poor fit or an association between variables.

Application to Real-World Data: Drawing Valid Conclusions

The ultimate goal is to apply these tests correctly to analyze real-world data and draw valid, justified conclusions. This involves more than just performing calculations; it requires careful planning and interpretation. First, you must identify the correct test based on your data type and research question: use a t-test for quantitative data involving means, and a chi-squared test for categorical data involving frequencies or associations.

Consider a scenario where a city wants to know if a new traffic law has reduced the average number of accidents per month from the historical mean of 20. This calls for a one-sample t-test. After collecting data for 12 months post-law, you calculate the sample mean and standard deviation, compute the t-statistic and p-value. If $p = 0.03$ and $α = 0.05$ , you reject $H_{0}$ and conclude there is significant evidence that the average number of accidents has changed.

For a chi-squared application, suppose a researcher surveys 300 people to see if exercise frequency (Daily, Weekly, Rarely) is associated with stress level (High, Medium, Low). After organizing the data into a 3x3 contingency table, you perform a chi-squared test for independence. If the p-value is 0.21, which is greater than $α = 0.05$ , you fail to reject $H_{0}$ . Your conclusion is that the sample data does not provide significant evidence of an association between exercise frequency and stress level. Crucially, you must report your findings in context, mentioning the test used, the p-value, the significance level, and the practical implication of the statistical decision.

Common Pitfalls

Misinterpreting "Fail to Reject $H_{0}$ " as Proof of $H_{0}$ : A high p-value does not prove the null hypothesis is true. It only indicates insufficient evidence against it based on your sample. For example, if a t-test yields $p = 0.30$ , you cannot conclude "the means are equal." You should state, "There is not enough evidence to conclude a difference in means exists."

Ignoring Test Assumptions: Each test relies on certain conditions for valid results. For t-tests, common assumptions include the data being approximately normally distributed (especially important for small samples) and, for two-sample tests, homogeneity of variances. Chi-squared tests require that expected frequencies are sufficiently large (typically all $E_{i} \geq 5$ ). Violating these can lead to inaccurate p-values. Always check assumptions before proceeding.

Confusing Test Types: Using a chi-squared test for goodness of fit when you need a test for independence, or vice versa, is a fundamental error. Remember: goodness of fit compares one variable to a distribution; independence tests the relationship between two variables. Carefully frame your null hypothesis to select the correct test.

Data Dredging without Hypothesis: Conducting multiple tests on the same dataset without a prior hypothesis increases the chance of a Type I error (falsely rejecting $H_{0}$ ). If you test 20 different variables at $α = 0.05$ , you'd expect one significant result by chance alone. Pre-specify your primary hypothesis and use correction methods if multiple comparisons are unavoidable.

Summary

Hypothesis testing is a structured process for evaluating claims using sample data, centered on comparing a p-value to a significance level ( $α$ ) to decide whether to reject the null hypothesis ( $H_{0}$ ) in favor of the alternative hypothesis ( $H_{1}$ ).
The t-test is used for inferences about population means, with variants for one sample or two independent samples, requiring you to calculate a t-statistic and assess it against the t-distribution.
Chi-squared tests analyze categorical data: the goodness of fit test compares observed frequencies to a theoretical distribution, while the test for independence assesses whether two categorical variables are related.
Valid application requires choosing the correct test based on data type and question, checking assumptions, and interpreting results in context—never equating a lack of significance with proof of no effect.
Always report the statistical test, p-value, significance level, and your conclusion in plain language relevant to the original research problem.

IB Math AI: Statistical Tests and Analysis

IB Math AI: Statistical Tests and Analysis

The Logic of Hypothesis Testing: From Question to Decision

Comparing Means: The t-Test

Assessing Distributions and Associations: Chi-Squared Tests

Application to Real-World Data: Drawing Valid Conclusions

Common Pitfalls

Summary

Write better notes with AI