IB Mathematics: Probability Distributions

Probability distributions provide the mathematical framework for understanding and quantifying uncertainty, a core pillar of statistical analysis. In IB Mathematics, mastering these distributions is not just about passing exams—it’s about developing the analytical toolkit to model real-world phenomena, from genetics and finance to quality control and scientific research.

Foundations: Random Variables and Distribution Basics

A random variable is a numerical outcome of a random process. We categorize them as either discrete (taking on specific, countable values) or continuous (taking on any value within an interval). The behavior of a random variable is described by its probability distribution, which specifies the likelihood of its possible outcomes.

For any discrete random variable $X$ , two fundamental concepts define its distribution. The expected value, denoted $E (X)$ or $μ$ , is the long-run average value of the variable if the experiment is repeated many times. It’s calculated as a weighted average: $E (X) = \sum x_{i} P (X = x_{i})$ . The variance, denoted $Va r (X)$ or $σ^{2}$ , measures the spread or dispersion of the values around the expected value. It is calculated as $Va r (X) = E (X^{2}) - [E (X)]^{2} = \sum (x_{i} - μ)^{2} P (X = x_{i})$ . The square root of the variance is the standard deviation, $σ$ , which is in the original units of the data. Understanding these calculations is non-negotiable for analyzing any distribution's central tendency and variability.

Discrete Probability Distributions

The Binomial Distribution

The binomial distribution models the number of successes in a fixed number of independent trials. It applies when you have:

A fixed number of trials, $n$ .
Each trial has only two outcomes: success (probability $p$ ) or failure (probability $1 - p$ ).
Trials are independent.
The probability $p$ is constant for each trial.

If $X \sim B (n, p)$ , then the probability of exactly $k$ successes is given by the formula: $P (X = k) = (k n) p^{k} (1 - p)^{n - k}$ where $(k n)$ is the binomial coefficient. The expected value and variance are neatly defined: $E (X) = n p$ and $Va r (X) = n p (1 - p)$ .

Example Application: A drug is 80% effective. In a trial of 10 patients, what is the probability exactly 7 are cured? Here, $X \sim B (10, 0.8)$ . Using the formula: $P (X = 7) = (7 10) (0.8)^{7} (0.2)^{3} \approx 0.201$ You would typically use your GDC's binomial probability density function (binompdf) for such calculations.

The Poisson Distribution

The Poisson distribution models the number of events occurring in a fixed interval of time or space. It is defined by a single parameter, $λ$ (lambda), which represents the average rate of occurrence. Key conditions are:

Events occur independently.
The average rate ( $λ$ ) is constant.
Two events cannot occur at exactly the same instant.

If $X \sim P o (λ)$ , the probability of observing $k$ events is: $P (X = k) = \frac{e ^{- λ} λ ^{k}}{k !}$ Its expected value and variance are identical: $E (X) = Va r (X) = λ$ . This property is a quick check for whether a dataset might be Poisson-distributed.

Example Application: A call center receives an average of 4 calls per minute. The probability of receiving exactly 6 calls in a given minute is: $P (X = 6) = \frac{e ^{- 4} 4 ^{6}}{6 !} \approx 0.104$ Use poissonpdf on your GDC. The Poisson distribution is excellent for modeling rare events like system failures or arrivals at a service point.

The Normal Distribution: The King of Continuous Distributions

The normal distribution is the most important continuous distribution. Its symmetric, bell-shaped curve is defined by its mean, $μ$ (the center), and its standard deviation, $σ$ (the spread). We write $X \sim N (μ, σ^{2})$ .

The total area under the curve is 1. Probabilities are found by calculating the area under the curve between two points. Because the formula is complex, we rely on the standard normal distribution, $Z \sim N (0, 1)$ , and the z-score formula for conversion: $z = \frac{x - μ}{σ}$ This transforms any normal variable $X$ into a standard normal variable $Z$ , allowing you to use standard normal tables or your GDC's normalcdf function.

Example Application: Test scores are normally distributed with $μ = 70$ and $σ = 10$ . What percentage scored above 85? First, find the z-score: $z = (85 - 70) /10 = 1.5$ . Find $P (Z > 1.5)$ using normalcdf(1.5, 1E99, 0, 1). This gives approximately 0.0668, or 6.68%.

A crucial skill is the inverse normal calculation: finding the data value (or z-score) corresponding to a given percentile. For instance, "Find the score at the 90th percentile" requires the invNorm function: invNorm(0.90, 70, 10). You must be fluent in using both normalcdf and invNorm on your GDC.

Distribution Fitting and Hypothesis Testing Applications

Distribution fitting is the process of selecting a probability distribution that best models your observed data. In the IB, this often involves:

Graphical Analysis: Plotting a histogram of your data and comparing its shape to known distributions (e.g., Is it symmetric and bell-shaped?).
Parameter Estimation: Using sample statistics to estimate distribution parameters (e.g., using $\overset{x}{ˉ}$ to estimate $μ$ for a normal distribution, or using the sample mean to estimate $λ$ for a Poisson).
Goodness-of-Fit Tests: Formally testing how well the data fits a hypothesized distribution using a chi-squared goodness-of-fit test. This is a core hypothesis testing application.

Hypothesis Testing Framework with Distributions: Distributions form the backbone of hypothesis tests. For example, you might use a binomial distribution to test if a coin is fair ( $H_{0} : p = 0.5$ ) or use a normal distribution to test if a sample mean differs from a population mean. The steps are consistent:

State the null ( $H_{0}$ ) and alternative ( $H_{1}$ ) hypotheses.
Choose the appropriate distribution and significance level ( $α$ , often 5%).
Calculate the test statistic (e.g., a z-score or chi-squared value).
Find the p-value—the probability, assuming $H_{0}$ is true, of obtaining a result at least as extreme as the one observed.
Compare the p-value to $α$ . If $p \leq α$ , reject $H_{0}$ .

Real-World Modeling: Your choice of distribution depends on the context. Use the binomial for counts of successes/failures from a fixed sample. Use the Poisson for counts of events in continuous time or space. Use the normal for measurements like heights, weights, or errors that tend to cluster around a mean.

Common Pitfalls

Misapplying Distributions: Using the binomial distribution for draws without replacement (which requires the hypergeometric) or using the Poisson when events are not independent. Correction: Always verbally check the conditions for each distribution before proceeding.

Confusing Discrete and Continuous Probability Calculations: For a discrete variable like the binomial, $P (X = k)$ has a specific, calculable value. For a continuous variable like the normal, $P (X = k) = 0$ for any single point $k$ . We always find probabilities for an interval: $P (a < X < b)$ . Correction: Remember that for continuous distributions, you are always finding an area. Use normalcdf(a, b, µ, σ).

Incorrect Use of GDC Functions: A major exam trap is using binompdf when you need binomcdf (cumulative probability), or misordering the bounds in normalcdf. Correction: pdf is for an exact number, cdf is for a range ( $P (X \leq k)$ ). For normalcdf, always input: lower bound, upper bound, mean, standard deviation.

Forgetting to Check Normality for Inference: Many statistical tests (like t-tests) assume the underlying data is approximately normally distributed. Correction: Before performing parametric hypothesis tests on sample data, check for normality using graphs (histogram, normal probability plot) or mention this as a necessary assumption.

Summary

Probability distributions, both discrete (binomial, Poisson) and continuous (normal), provide the models for analyzing random phenomena. Their expected value and variance are key descriptors.
The binomial distribution $B (n, p)$ models fixed-number success/failure trials, with $E (X) = n p$ .
The Poisson distribution $P o (λ)$ models events in a fixed interval, with $E (X) = Va r (X) = λ$ .
The normal distribution $N (μ, σ^{2})$ is fundamental for continuous data; use z-scores and GDC functions (normalcdf, invNorm) for calculations.
Distribution fitting involves selecting and justifying an appropriate model for data, a critical step before hypothesis testing.
Always verify the conditions for a distribution and know the precise functions and syntax for your GDC to avoid computational errors in exams.

IB Mathematics: Probability Distributions

IB Mathematics: Probability Distributions

Foundations: Random Variables and Distribution Basics

Discrete Probability Distributions

The Binomial Distribution

The Poisson Distribution

The Normal Distribution: The King of Continuous Distributions

Distribution Fitting and Hypothesis Testing Applications

Common Pitfalls

Summary

Write better notes with AI