IB AA: Continuous Random Variables

Continuous random variables form the mathematical backbone for modeling uncountably infinite outcomes, such as time, distance, or temperature. Mastering them is crucial for IB Analysis & Approaches, as they bridge calculus with real-world probability and are a frequent exam topic requiring both conceptual understanding and computational skill.

Probability Density Functions: The Foundation

A probability density function (PDF), denoted $f (x)$ , describes the relative likelihood of a continuous random variable $X$ taking on a specific value. Unlike discrete probability, where $P (X = x)$ has meaning, for continuous variables, the probability at a single point is zero. Instead, probability is defined over intervals as the area under the PDF curve. Every valid PDF must satisfy two key properties. First, it must be non-negative for all $x$ : $f (x) \geq 0$ . Second, the total area under the curve must equal 1, representing certainty: $\int_{- \infty}^{\infty} f (x) d x = 1.$

The process of ensuring the second property holds is called normalization. You will often be given a function proportional to a PDF, such as $g (x) = k x$ for $0 \leq x \leq 2$ , and must find the constant $k$ that makes it a valid PDF. You do this by setting the integral equal to 1: $\int_{0}^{2} k x d x = 1$ . Solving gives $k [\frac{x ^{2}}{2}]_{0}^{2} = 2 k = 1$ , so $k = \frac{1}{2}$ . Think of the PDF as a smooth histogram; its height shows density, not probability, and the total area of all bars sums to 1.

Cumulative Distribution Functions: From Density to Probability

The cumulative distribution function (CDF), denoted $F (x)$ , gives the probability that $X$ is less than or equal to a specific value: $F (x) = P (X \leq x)$ . It is computed directly from the PDF by integration: $F (x) = \int_{- \infty}^{x} f (t) d t .$ The CDF is a non-decreasing function that ranges from 0 to 1, with $lim_{x \to - \infty} F (x) = 0$ and $lim_{x \to \infty} F (x) = 1$ .

To find probabilities for an interval $a < X \leq b$ , you use the CDF: $P (a < X \leq b) = F (b) - F (a)$ . The fundamental theorem of calculus shows the inverse relationship: the PDF is the derivative of the CDF, $f (x) = F^{'} (x)$ , wherever the derivative exists. For a worked example, consider the PDF $f (x) = \frac{1}{2} x$ for $0 \leq x \leq 2$ . Its CDF is $F (x) = \int_{0}^{x} \frac{1}{2} t d t = \frac{x ^{2}}{4}$ for $0 \leq x \leq 2$ . To find $P (1 < X < 1.5)$ , compute $F (1.5) - F (1) = \frac{( 1.5 ) ^{2}}{4} - \frac{1 ^{2}}{4} = 0.5625 - 0.25 = 0.3125$ .

Expected Value and Variance: Measuring Center and Spread

The expected value (mean) of a continuous random variable, denoted $E (X)$ or $μ$ , is the long-run average outcome, weighted by probability density. It is calculated by integrating $x$ times the PDF over all possible values: $E (X) = \int_{- \infty}^{\infty} x f (x) d x .$ Variance, denoted $Va r (X)$ or $σ^{2}$ , measures the spread or dispersion around the mean, defined as the expected value of the squared deviation: $Va r (X) = E [(X - μ)^{2}]$ . A more computational formula is $Va r (X) = E (X^{2}) - [E (X)]^{2}$ , where $E (X^{2}) = \int_{- \infty}^{\infty} x^{2} f (x) d x$ . The standard deviation is simply the square root of the variance: $σ = Va r (X)$ .

Let's compute these for the PDF $f (x) = \frac{1}{2} x$ on $[0, 2]$ . First, the mean: $E (X) = \int_{0}^{2} x \cdot \frac{1}{2} x d x = \frac{1}{2} \int_{0}^{2} x^{2} d x = \frac{1}{2} [\frac{x ^{3}}{3}]_{0}^{2} = \frac{1}{2} \cdot \frac{8}{3} = \frac{4}{3} .$ Next, find $E (X^{2})$ : $E (X^{2}) = \int_{0}^{2} x^{2} \cdot \frac{1}{2} x d x = \frac{1}{2} \int_{0}^{2} x^{3} d x = \frac{1}{2} [\frac{x ^{4}}{4}]_{0}^{2} = \frac{1}{2} \cdot 4 = 2.$ Thus, variance is $Va r (X) = E (X^{2}) - [E (X)]^{2} = 2 - (\frac{4}{3})^{2} = 2 - \frac{16}{9} = \frac{2}{9}$ .

Median and Mode: Other Measures of Central Tendency

For continuous variables, the median is the value $m$ such that half the probability lies below it and half above, satisfying $P (X \leq m) = 0.5$ . In terms of the CDF, you solve $F (m) = 0.5$ . Using our example CDF $F (x) = \frac{x ^{2}}{4}$ , set $\frac{m ^{2}}{4} = 0.5$ , so $m^{2} = 2$ and $m = 2 \approx 1.414$ . The mode is the value at which the PDF $f (x)$ achieves its maximum. It represents the most likely outcome in a density sense. For $f (x) = \frac{1}{2} x$ on $[0, 2]$ , the function increases linearly, so the maximum is at the right endpoint: $x = 2$ . In symmetric distributions like the normal, the mean, median, and mode coincide.

Common Continuous Distributions and Real-World Modeling

The uniform distribution is the simplest continuous model, where all intervals of equal length have the same probability. If $X$ is uniform on $[a, b]$ , its PDF is constant: $f (x) = \frac{1}{b - a}$ for $a \leq x \leq b$ . Its CDF is a straight line: $F (x) = \frac{x - a}{b - a}$ . The mean is the midpoint $E (X) = \frac{a + b}{2}$ , and variance is $Va r (X) = \frac{( b - a ) ^{2}}{12}$ . Uniform distributions model scenarios like random number generation or waiting for a bus that arrives at fixed intervals.

Other essential continuous distributions include the exponential distribution for modeling waiting times or decay processes, and the normal distribution for bell-shaped data like heights or test scores. Connecting these to real-world modeling involves identifying the key characteristics of a phenomenon—such as whether it is memoryless (exponential) or symmetric (normal)—and selecting the appropriate PDF. For instance, the time between customer arrivals at a store might follow an exponential distribution, while errors in a physical measurement often follow a normal distribution. The power of continuous random variables lies in using integration over these PDFs to make precise probabilistic predictions about complex, measurable events.

Common Pitfalls

Treating PDF value as a probability: A common error is interpreting $f (a)$ as $P (X = a)$ . Remember, for continuous variables, $P (X = a) = 0$ . Probability is always an area under the PDF curve, so you must integrate over an interval.

Misapplying integration limits: When computing CDFs or expected values, ensure your limits match the support of the PDF. For a PDF defined only on $[0, 2]$ , integrals for $F (x)$ should run from 0 to $x$ , not from $- \infty$ . Similarly, $E (X)$ integrates from 0 to 2.

Forgetting to normalize: If given a function like $g (x) = c (3 - x)$ on an interval, you must first find $c$ by setting $\int g (x) d x = 1$ to obtain the valid PDF $f (x)$ . Skipping this step leads to incorrect probabilities and expectations.

Incorrectly finding the median: The median satisfies $F (m) = 0.5$ , not $f (m) = 0.5$ . Confusing the PDF and CDF here will yield the wrong value. Always use the CDF equation to solve for the median.

Summary

A probability density function (PDF) $f (x)$ must be non-negative and integrate to 1 over all space; it defines probabilities via area under the curve.
The cumulative distribution function (CDF) $F (x)$ is found by integrating the PDF and gives $P (X \leq x)$ ; probabilities for intervals are differences in CDF values.
Expected value and variance are calculated through integration: $E (X) = \int x f (x) d x$ and $Va r (X) = E (X^{2}) - [E (X)]^{2}$ .
The median is the solution to $F (m) = 0.5$ , and the mode is the value maximizing $f (x)$ .
The uniform distribution has a constant PDF and is a foundational model; other distributions like exponential and normal extend modeling to various real-world scenarios.
Always remember that for continuous random variables, probability is area, not height, and proper integration techniques are essential for accurate computation.

IB AA: Continuous Random Variables

IB AA: Continuous Random Variables

Probability Density Functions: The Foundation

Cumulative Distribution Functions: From Density to Probability

Expected Value and Variance: Measuring Center and Spread

Median and Mode: Other Measures of Central Tendency

Common Continuous Distributions and Real-World Modeling

Common Pitfalls

Summary

Write better notes with AI