AP Statistics: Combining Random Variables

When you analyze real-world data, you rarely look at just one variable in isolation. Whether you're predicting the total production time for two factory machines, calculating the net gain from multiple investments, or estimating the total error in a scientific measurement, you need to understand how multiple random variables behave together. The rules for combining random variables provide the mathematical toolkit to move from analyzing single variables to understanding their sums and differences, a fundamental skill for statistical modeling and inference.

Defining Random Variables and the Core Rules

A random variable is a numerical outcome of a random process. We denote them with capital letters like $X$ and $Y$ . Each random variable has its own probability distribution, characterized primarily by its mean (the expected value, $μ_{X}$ ) and its variance (the expected squared deviation from the mean, $σ_{X}^{2}$ ). The standard deviation, $σ_{X}$ , is simply the square root of the variance and measures the typical spread of the variable.

The foundational rules for combining independent random variables are beautifully simple. For any two random variables $X$ and $Y$ , and constants $a$ and $b$ , the following always hold for their means: $μ_{a X + bY} = a μ_{X} + b μ_{Y}$

This rule is intuitive: if you scale a variable, you scale its mean; if you add variables, their means add. The powerful companion rule applies specifically to their variances, but only under the critical condition that the variables are independent. For independent $X$ and $Y$ : $σ_{a X + bY}^{2} = a^{2} σ_{X}^{2} + b^{2} σ_{Y}^{2}$

Notice the key difference: the constants are squared when they factor into the variance. Most importantly, variances add. This is the engine that drives all analysis of combined variables. A direct and crucial consequence is that standard deviations, being the square roots of variances, do not simply add. You must always find the combined variance first, then take its square root: $σ_{a X + bY} = a^{2} σ_{X}^{2} + b^{2} σ_{Y}^{2}$

Applying the Rules: Sums of Random Variables

The most common application is finding the distribution of a sum, $S = X + Y$ . Here, we are simply using the formulas above with $a = 1$ and $b = 1$ .

Mean of a Sum: $μ_{S} = μ_{X} + μ_{Y}$ . The expected total is the sum of the expected parts. Variance of a Sum (if independent): $σ_{S}^{2} = σ_{X}^{2} + σ_{Y}^{2}$ . Standard Deviation of a Sum: $σ_{S} = σ_{X}^{2} + σ_{Y}^{2}$ .

Worked Example: Suppose the weight of an apple ( $X$ ) has a mean of $μ_{X} = 150$ grams with a standard deviation of $σ_{X} = 10$ grams. The weight of an orange ( $Y$ ) has a mean of $μ_{Y} = 130$ grams and $σ_{Y} = 8$ grams. If you pick one apple and one orange at random into a basket, what are the mean and standard deviation of the total weight $T = X + Y$ ?

Mean: $μ_{T} = μ_{X} + μ_{Y} = 150 + 130 = 280$ grams.
Variance: First, square the individual standard deviations to get variances: $σ_{X}^{2} = 1 0^{2} = 100$ , $σ_{Y}^{2} = 8^{2} = 64$ .

Then, $σ_{T}^{2} = σ_{X}^{2} + σ_{Y}^{2} = 100 + 64 = 164$ .

Standard Deviation: $σ_{T} = 164 \approx 12.81$ grams.

The total weight is expected to be 280 grams, with a typical variation (standard deviation) of about 12.81 grams. Notice that 12.81 is less than the sum of the individual standard deviations (10 + 8 = 18). This is a universal truth: combining independent sources of variability actually leads to a final spread that is smaller than the naive sum of the spreads.

Applying the Rules: Differences of Random Variables

The rules work equally well for differences, which are just a special case of the linear combination with $a = 1$ and $b = - 1$ . For the difference $D = X - Y$ :

Mean of a Difference: $μ_{D} = μ_{X} - μ_{Y}$ . The expected difference is the difference of the expected values. Variance of a Difference (if independent): $σ_{D}^{2} = σ_{X}^{2} + (- 1)^{2} σ_{Y}^{2} = σ_{X}^{2} + σ_{Y}^{2}$ . Standard Deviation of a Difference: $σ_{D} = σ_{X}^{2} + σ_{Y}^{2}$ .

Crucially, the variance of a difference is also the sum of the individual variances. The minus sign disappears when squared. This means the spread of a difference is calculated the same way as the spread of a sum.

Worked Example: Using the same fruit data, what are the mean and standard deviation for the difference in weight between the apple and the orange, $D = X - Y$ ?

Mean: $μ_{D} = μ_{X} - μ_{Y} = 150 - 130 = 20$ grams. On average, the apple is 20 grams heavier.
Variance: $σ_{D}^{2} = σ_{X}^{2} + σ_{Y}^{2} = 100 + 64 = 164$ .
Standard Deviation: $σ_{D} = 164 \approx 12.81$ grams.

The difference has the same standard deviation as the sum! This often surprises students, but it makes sense when you consider that variability in either fruit contributes to uncertainty in the difference. If both weights fluctuate, the gap between them fluctuates too.

Extending to Multiple Variables and Linear Combinations

The rules generalize perfectly to any number of independent variables and any linear combination. For independent variables $X_{1}, X_{2}, ..., X_{n}$ and constants $a_{1}, a_{2}, ..., a_{n}$ , if we define $L = a_{1} X_{1} + a_{2} X_{2} + ... + a_{n} X_{n}$ , then: $μ_{L} = a_{1} μ_{1} + a_{2} μ_{2} + ... + a_{n} μ_{n}$ $σ_{L}^{2} = a_{1}^{2} σ_{1}^{2} + a_{2}^{2} σ_{2}^{2} + ... + a_{n}^{2} σ_{n}^{2}$

This is incredibly powerful. For example, if a factory box contains 6 independently produced widgets, each with mean weight $μ_{w} = 50$ g and SD $σ_{w} = 2$ g, the total box weight $B$ is the sum of 6 identical variables.

Mean: $μ_{B} = 6 * 50 = 300$ g.
Variance: $σ_{B}^{2} = 6 * (2^{2}) = 6 * 4 = 24$ .
Standard Deviation: $σ_{B} = 24 \approx 4.90$ g.

Again, note the standard deviation of the total (4.90g) is far less than simply $6 * 2 = 12$ g.

Common Pitfalls

Adding Standard Deviations Directly: The most frequent and critical error is to assume $σ_{X + Y} = σ_{X} + σ_{Y}$ . This is false. You must always add variances, then take the square root. If you find yourself adding standard deviations, you've almost certainly made a mistake.

Correction: Remember the sequence: 1) Square SDs to get variances. 2) Add the variances. 3) Square root the result to get the new SD.

Misapplying the Variance Rule for Differences: Students often want to subtract variances when finding the variance of $X - Y$ . This is incorrect because the negative sign squares to positive.

Correction: The variance of a difference is $σ_{X}^{2} + σ_{Y}^{2}$ , identical to the formula for a sum.

Ignoring the Independence Condition: The vital rule $σ_{a X + bY}^{2} = a^{2} σ_{X}^{2} + b^{2} σ_{Y}^{2}$ requires that X and Y are independent. If the variables are correlated, a covariance term must be included. In AP Statistics, you can assume independence unless a problem explicitly states otherwise or provides data suggesting dependence.

Correction: Always pause and ask, "Does the problem context imply these variables are independent?" If not, the simple variance addition rule does not apply.

Forgetting to Square Constants in the Variance Formula: When dealing with a combination like $3 X$ , the mean is $3 μ_{X}$ , but the variance is $3^{2} σ_{X}^{2} = 9 σ_{X}^{2}$ , not $3 σ_{X}^{2}$ .

Correction: When moving from a formula for the mean to a formula for the variance, remember to apply the operation to the constant (squaring it) before multiplying by the variance.

Summary

The mean of a sum or difference of random variables is always the sum or difference of their individual means: $μ_{a X + bY} = a μ_{X} + b μ_{Y}$ .
The variance of a linear combination of independent random variables is found by adding the scaled variances: $σ_{a X + bY}^{2} = a^{2} σ_{X}^{2} + b^{2} σ_{Y}^{2}$ . This rule applies to both sums ( $X + Y$ ) and differences ( $X - Y$ ).
Standard deviations do not add. To find the standard deviation of a combined variable, you must first compute the combined variance using the rule above, then take its square root.
The cornerstone of these calculations is the concept of independence. The simple variance addition rule fails if the variables are not independent.
Mastering these rules allows you to model and understand the behavior of complex systems—from total project costs to measurement errors—by breaking them down into their independent component parts.

AP Statistics: Combining Random Variables

AP Statistics: Combining Random Variables

Defining Random Variables and the Core Rules

Applying the Rules: Sums of Random Variables

Applying the Rules: Differences of Random Variables

Extending to Multiple Variables and Linear Combinations

Common Pitfalls

Summary

Write better notes with AI