UK A-Level: Hypothesis Testing

Hypothesis testing is the formal statistical process for making decisions using data, moving beyond intuition to quantified evidence. Whether in scientific research, quality control, or social studies, it provides a structured framework to determine if an observed effect is genuine or merely due to random chance. Mastering this topic is crucial for the Statistics components of your A-Level, as it forms the backbone of inferential statistics and data-driven conclusions.

Foundations: Stating Hypotheses and Setting Significance

Every hypothesis test begins with a clear statement of two competing claims. The null hypothesis, denoted $H_{0}$ , is a statement of "no effect," "no change," or the status quo you aim to test against. For example, if testing whether a coin is biased, $H_{0}$ would be $p = 0.5$ , where $p$ is the probability of landing on heads. The alternative hypothesis, denoted $H_{1}$ , is what you suspect might be true instead. It could be one-sided (e.g., $p > 0.5$ , suggesting the coin is biased towards heads) or two-sided (e.g., $p \neq = 0.5$ , suggesting the coin is biased, but direction unknown).

Before collecting data, you must set a significance level, denoted by $α$ . This is the threshold probability you are willing to accept for rejecting $H_{0}$ when it is actually true (a Type I error). In A-Level, this is typically 5% ( $α = 0.05$ ), though 1% and 10% are also used. Choosing a 5% level means you accept a 5% risk of concluding there is an effect when there isn't one.

The choice between a one-tailed test and a two-tailed test depends on your $H_{1}$ . A one-tailed test (e.g., $p > 0.5$ ) looks for evidence of an effect in only one direction. A two-tailed test (e.g., $p \neq = 0.5$ ) looks for evidence in either direction. This choice directly impacts the critical region, which is the set of all test statistic values that would lead you to reject $H_{0}$ .

The Mechanics: Critical Regions and P-Values

There are two equivalent approaches to making a decision in a hypothesis test: the critical value method and the p-value approach.

The critical value method involves pre-determining the critical region based on your significance level and the distribution of your test statistic (e.g., Binomial or Normal). If your observed test statistic falls within this extreme region, you reject $H_{0}$ . For a two-tailed test at the 5% level, the critical region is split, with 2.5% in each tail. The boundaries of this region are the critical values.

The p-value approach is often more informative. The p-value is the probability of obtaining a test statistic at least as extreme as the one you observed, assuming $H_{0}$ is true. You then compare this p-value directly to your significance level $α$ . If $p -value \leq α$ , you reject $H_{0}$ . A small p-value indicates that your observed data is unlikely under the null hypothesis, providing evidence against it.

For example, if testing $H_{0} : p = 0.5$ against $H_{1} : p > 0.5$ and you get a p-value of 0.03 with $α = 0.05$ , you would reject $H_{0}$ . The p-value of 0.03 means there's only a 3% chance of seeing your result (or something more extreme) if the coin were actually fair.

Applying the Framework: Binomial Hypothesis Tests

Binomial hypothesis tests are used when you have a single variable with two outcomes (success/failure) from a fixed number of independent trials, and you are testing the population proportion (probability of success), $p$ .

The test statistic, $X$ , is simply the observed number of successes, which follows a $B (n, p)$ distribution under $H_{0}$ . Your job is to see if this observed count is too extreme.

Worked Example: A company claims 80% of its light bulbs last over 1000 hours. A consumer group tests 20 bulbs and finds 14 last that long. Test the company's claim at the 5% significance level.

Define: Let $p$ be the true proportion lasting over 1000h.
Hypotheses: $H_{0} : p = 0.8$ , $H_{1} : p < 0.8$ (one-tailed, as we're checking if the claim is exaggerated).
Assume $X \sim B (20, 0.8)$ under $H_{0}$ , where $X$ is the number lasting over 1000h.
Observed: $X = 14$ .
p-value = $P (X \leq 14)$ under $B (20, 0.8)$ . Using tables or calculator: $P (X \leq 14) \approx 0.1958$ .
Since $0.1958 > 0.05$ , the result is not significant.
Conclusion: There is insufficient evidence at the 5% level to reject $H_{0}$ . The sample does not disprove the company's claim.

Applying the Framework: Normal Distribution Hypothesis Tests

Normal distribution hypothesis tests are used when testing a population mean, $μ$ , and you either know the population variance $σ^{2}$ or are using a large sample (so the sample mean's distribution is approximately normal by the Central Limit Theorem).

Here, your test statistic is a z-score: $z = \frac{x ˉ - μ}{σ / n}$ which follows a standard normal distribution, $N (0, 1)$ , if $H_{0}$ is true. You compare this calculated $z$ value to critical values from the $N (0, 1)$ tables.

Worked Example: A machine fills bags to a nominal mean of 500g. The known standard deviation is 4g. A random sample of 10 bags has a mean mass of 498g. Test at the 5% level if the machine is underfilling.

Define: Let $μ$ be the true population mean mass.
Hypotheses: $H_{0} : μ = 500$ , $H_{1} : μ < 500$ (one-tailed).
Under $H_{0}$ , sample mean distribution is $N (500, 4^{2} /10)$ . The test statistic is:

$z = \frac{498 - 500}{4/ 10} = \frac{- 2}{1.265} \approx - 1.58$

The critical value for a one-tailed 5% test is $z = - 1.6449$ .
Since $- 1.58 > - 1.6449$ , the test statistic is not in the critical region.
Alternatively, p-value = $P (Z \leq - 1.58) \approx 0.0571 > 0.05$ .
Conclusion: Insufficient evidence to reject $H_{0}$ at the 5% level. The machine is not shown to be underfilling.

Common Pitfalls

Confusing $H_{0}$ and $H_{1}$ : A common mistake is to set $H_{1}$ as the claim you want to "prove." In fact, you test against $H_{0}$ . Your initial assumption is always that $H_{0}$ is true; you only reject it if the evidence is strong enough. The claim you are investigating is often $H_{1}$ , but you cannot prove it—you can only find evidence in its support.

Misinterpreting "Accept $H_{0}$ ": When your test is not significant, the correct phrasing is "There is insufficient evidence to reject $H_{0}$ " or "We do not reject $H_{0}$ ." You never "accept $H_{0}$ " as definitively true. A non-significant result might simply mean your sample size was too small to detect an effect that actually exists.

Incorrect Tail Selection: Using a one-tailed test when a two-tailed is appropriate inflates your chance of a false positive. If the question or context does not specify a direction of interest (e.g., "test if the mean has changed"), you must use a two-tailed test. Reserve one-tailed tests for when you are only interested in a deviation in one specific direction (e.g., "test if the mean has increased").

Mixing up Binomial $p$ and Significance Level $α$ : In Binomial tests, $p$ is the population proportion you are testing (e.g., $H_{0} : p = 0.3$ ). The significance level $α$ (e.g., 0.05) is the probability threshold for rejection. They are completely different concepts. Ensure your final comparison is between a p-value and $α$ , not between a sample proportion and $α$ .

Summary

Hypothesis testing is a formal, four-step process: state hypotheses ( $H_{0}$ and $H_{1}$ ), select a significance level ( $α$ ), calculate a test statistic and corresponding p-value, and make a contextual conclusion.
The p-value measures the strength of evidence against $H_{0}$ . If $p -value \leq α$ , reject $H_{0}$ in favour of $H_{1}$ .
For Binomial tests, the test statistic is the count of successes, and probabilities are found using the $B (n, p)$ distribution under $H_{0}$ .
For Normal tests concerning a mean, the test statistic is a z-score, calculated using the sample mean, hypothesised mean, and standard error, then compared to the standard normal distribution.
Your conclusion must be in the context of the original problem and must correctly reflect the statistical decision—never "prove" or definitively "accept" a hypothesis.

UK A-Level: Hypothesis Testing

UK A-Level: Hypothesis Testing

Foundations: Stating Hypotheses and Setting Significance

The Mechanics: Critical Regions and P-Values

Applying the Framework: Binomial Hypothesis Tests

Applying the Framework: Normal Distribution Hypothesis Tests

Common Pitfalls

Summary

Write better notes with AI