IB AA: Statistical Tests and Inference

Statistical inference transforms raw data into meaningful conclusions, allowing you to make decisions and predictions in the face of uncertainty. Whether testing a new drug, analyzing economic trends, or validating a scientific model, the techniques of hypothesis testing, correlation, and regression form the backbone of quantitative reasoning. Mastering these tools is not just about passing exams; it's about developing a critical, evidence-based mindset for interpreting the world.

Foundations of Hypothesis Testing

At the core of statistical inference lies hypothesis testing, a formal procedure for using sample data to evaluate claims about a population. You begin by stating two competing hypotheses: the null hypothesis ( $H_{0}$ ), which represents a default position of "no effect" or "no difference," and the alternative hypothesis ( $H_{1}$ ), which is what you seek evidence for. The outcome of a test is determined by analyzing the probability of observing your sample data, or something more extreme, assuming the null hypothesis is true.

This probability is quantified by the p-value. Formally, the p-value is the probability of obtaining a test statistic at least as extreme as the one calculated from your sample, given that $H_{0}$ is true. A small p-value indicates that your observed data is unlikely under the null hypothesis. You compare this p-value to a pre-determined significance level ( $α$ ), often 0.05. If the p-value $\leq α$ , you reject the null hypothesis in favor of the alternative. If the p-value $> α$ , you fail to reject the null hypothesis. Crucially, failing to reject $H_{0}$ does not prove it is true; it simply means there isn't sufficient evidence against it based on this sample.

Tests for Categorical Data: Chi-Squared

When your data involves counts or frequencies across categories, the chi-squared ( $χ^{2}$ ) test is the appropriate tool. There are two primary types you must distinguish. The chi-squared test for goodness of fit assesses how well an observed frequency distribution matches an expected distribution. For example, you might test if the colors of candies in a bag follow the proportions advertised by the manufacturer. The test statistic is calculated as:

$χ^{2} = \sum \frac{( O _{i} - E _{i} ) ^{2}}{E _{i}}$

where $O_{i}$ and $E_{i}$ are the observed and expected frequencies for category $i$ .

The chi-squared test for independence determines if there is a significant association between two categorical variables in a contingency table. For instance, you could test whether phone brand preference (Apple, Android) is independent of age group (Teen, Adult). The null hypothesis here is that the two variables are independent. For both tests, you compare the calculated $χ^{2}$ statistic to a critical value from the chi-squared distribution, with degrees of freedom determined by the number of categories.

Tests for Quantitative Data: The t-Test

When comparing means, the t-test is your go-to method. The one-sample t-test checks if the mean of a single sample differs significantly from a hypothesized population mean. More commonly in the IB syllabus, you will use the two-sample or unpaired t-test. This test compares the means of two independent groups to see if they are statistically different from each other, such as comparing the average test scores of students who used a new study technique versus those who used a traditional method.

The test statistic follows the formula:

$t = \frac{x ˉ _{1} - x ˉ _{2}}{\frac{s _{1}^{2}}{n _{1}} + \frac{s _{2}^{2}}{n _{2}}}$

where $\overset{x}{ˉ}$ represents sample means, $s^{2}$ represents sample variances, and $n$ represents sample sizes. The calculated t-value is then compared to the t-distribution. A key assumption for the standard t-test is that the data in each group are approximately normally distributed, especially important with small sample sizes.

Measuring Relationships: Correlation

To quantify the strength and direction of a linear relationship between two quantitative variables, you use the correlation coefficient, specifically Pearson's $r$ . Its value ranges from -1 to +1. An $r$ value of +1 indicates a perfect positive linear relationship, -1 a perfect negative linear relationship, and 0 indicates no linear correlation. It’s vital to remember that correlation measures linear association, not just any relationship.

More importantly, correlation does not imply causation. Just because two variables move together (e.g., ice cream sales and drowning incidents) does not mean one causes the other. They may both be influenced by a lurking variable (like hot weather). The square of the correlation coefficient, $r^{2}$ , called the coefficient of determination, tells you the proportion of variance in one variable that is predictable from the other.

Modeling Relationships: Linear Regression

While correlation assesses the strength of a relationship, linear regression models it to make predictions. The goal is to find the line of best fit that minimizes the sum of the squared vertical distances between the observed data points and the line. This method is called the method of least squares.

The equation of the regression line is $y = a + b x$ , where:

$y$ is the dependent (response) variable.
$x$ is the independent (explanatory) variable.
$b$ is the slope, representing the change in $y$ for a one-unit increase in $x$ .
$a$ is the y-intercept, the predicted value of $y$ when $x = 0$ .

The values of $a$ and $b$ are calculated using formulas derived from the least squares criterion:

$b = r \frac{s _{y}}{s _{x}}, a = \overset{y}{ˉ} - b \overset{x}{ˉ}$

where $s_{y}$ and $s_{x}$ are the standard deviations of the $y$ and $x$ variables, and $r$ is the correlation coefficient. Once you have the line, you can use it for interpolation (predicting within the range of your data) but should be extremely cautious about extrapolation (predicting outside that range), as the relationship may not hold.

Common Pitfalls

Misinterpreting the p-value: The most common error is believing the p-value is the probability that the null hypothesis is true. It is not. It is the probability of the data given the null hypothesis. Similarly, a p-value of 0.06 does not mean the result is "insignificant" while 0.04 is "significant"; it indicates a gradient of evidence against $H_{0}$ .

Confusing test types: Using a chi-squared test for independence when you need a goodness-of-fit test, or applying a t-test to categorical data, invalidates your analysis. Always match the test to your data type (categorical vs. quantitative) and research question (comparing distributions, testing independence, or comparing means).

Assuming correlation means causation: This is a fundamental logical error. Always consider the possibility of confounding variables or mere coincidence before concluding that $x$ causes $y$ based solely on a correlation coefficient.

Extrapolating recklessly with regression: A regression model is only validated within the range of the data used to create it. Predicting far outside this range assumes the linear relationship continues indefinitely, which is often false and can lead to nonsensical predictions.

Summary

Hypothesis testing is a structured process of using sample data to evaluate claims, centered on interpreting the p-value in relation to a significance level ( $α$ ).
Use the chi-squared test for categorical data: the goodness-of-fit test compares observed to expected distributions, while the test for independence checks for associations between two variables.
The t-test is used to compare the means of two independent groups, with its test statistic following a t-distribution.
Correlation ( $r$ ) measures the strength and direction of a linear relationship, but it is critical to remember it does not prove causation.
Linear regression by least squares finds the line of best fit ( $y = a + b x$ ) for prediction, with the slope $b$ indicating the change in the response variable per unit change in the explanatory variable.

IB AA: Statistical Tests and Inference

IB AA: Statistical Tests and Inference

Foundations of Hypothesis Testing

Tests for Categorical Data: Chi-Squared

Tests for Quantitative Data: The t-Test

Measuring Relationships: Correlation

Modeling Relationships: Linear Regression

Common Pitfalls

Summary

Write better notes with AI