AP Statistics: Binomial Distribution

The binomial distribution is the workhorse of probability for modeling scenarios where you count successes in a fixed series of trials, from quality control in manufacturing to clinical trial outcomes. Mastering it is crucial for AP Statistics because it provides the foundation for inference about proportions and connects directly to the normal model.

What Defines a Binomial Experiment?

A binomial setting arises when you perform a fixed number of independent trials and count how many times a well-defined event, called a "success," occurs. Before using any binomial tools, you must verify four conditions, often remembered with the acronym BINS.

B – Binary Outcomes: Each trial must result in one of only two possible outcomes. These are universally labeled "success" and "failure," regardless of their real-world connotation. A success might be a defective part (in quality control) or a patient recovering (in medicine).
I – Independent Trials: The outcome of one trial cannot influence the outcome of any other trial. This is often ensured by random sampling from a very large population or by proper experimental design, like using a random number generator.
N – Fixed Number of Trials: The number of trials, denoted by $n$ , must be fixed in advance. You know you will flip the coin 10 times or sample 50 components.
S – Constant Success Probability: The probability of success, denoted by $p$ , must remain constant for every single trial.

If these conditions are met, the random variable $X$ = the number of successes in $n$ trials has a binomial distribution. We write $X \sim Binomial (n, p)$ . For example, if you randomly select 15 students from a large school where 30% are left-handed, the count of left-handed students in your sample, $X$ , is approximately binomial with $n = 15$ and $p = 0.30$ . The independence condition is approximately met because the sample size is small relative to the population.

Calculating Binomial Probabilities

The probability of getting exactly $k$ successes in $n$ trials is given by the binomial probability formula:

$P (X = k) = (k n) p^{k} (1 - p)^{n - k}$

Let's break down this formula with an example. Suppose a multiple-choice quiz has 5 questions, each with 4 choices. You guess randomly on every question. What is the probability you get exactly 3 correct?

Here, a "success" is a correct answer. We have $n = 5$ trials (questions), $p = 0.25$ probability of success per question, and we want $k = 3$ .

The Binomial Coefficient: $(k n)$ (read as "n choose k") calculates the number of ways to arrange $k$ successes among $n$ trials. It's computed as $\frac{n !}{k ! ( n - k )!}$ . Here, $(3 5) = \frac{5 !}{3 ! 2 !} = 10$ . There are 10 different sequences of correct/incorrect answers that yield 3 correct responses.
$p^{k}$ : This is the probability of the $k$ successes occurring: $(0.25)^{3} = 0.015625$ .
$(1 - p)^{n - k}$ : This is the probability of the remaining $n - k$ failures: $(0.75)^{2} = 0.5625$ .

Multiplying these together gives the probability for one specific sequence. We then multiply by the number of possible sequences: $P (X = 3) = 10 \times 0.015625 \times 0.5625 = 0.087890625$

So, there's about an 8.8% chance of guessing exactly 3 answers correctly. You will use your calculator's binompdf(n, p, k) function for these exact calculations on the AP exam, but understanding the formula is essential for interpreting your results.

Mean, Standard Deviation, and Shape

Like any distribution, a binomial distribution has a center and spread. These are derived directly from $n$ and $p$ .

Mean (Expected Value): $μ_{X} = n p$ . This is intuitive: if you have 100 trials with a 20% success rate, you expect $100 \times 0.20 = 20$ successes on average.
Standard Deviation: $σ_{X} = n p (1 - p)$ . This measures the typical variation in the count of successes from one set of $n$ trials to another. For our guessing example, $μ = 5 \times 0.25 = 1.25$ and $σ = 5 \times 0.25 \times 0.75 = 0.9375 \approx 0.968$ .

The shape of a binomial distribution depends on $p$ . It is symmetric when $p = 0.5$ , skewed right when $p < 0.5$ , and skewed left when $p > 0.5$ . As $n$ increases, the distribution becomes more symmetric and bell-shaped, which leads us to a powerful approximation.

The Normal Approximation to the Binomial

For large sample sizes, calculating exact binomial probabilities for ranges (e.g., $P (X \leq 40)$ ) can be tedious. Fortunately, when $n$ is sufficiently large, the binomial distribution can be approximated by a normal distribution with the same mean and standard deviation: $N (n p, n p (1 - p))$ .

The standard rule of thumb for when this approximation is appropriate is: $n p \geq 10 and n (1 - p) \geq 10$ This ensures the distribution is not too skewed.

Critical Step: Continuity Correction. Because we are approximating a discrete distribution (binomial) with a continuous one (normal), we must apply a continuity correction. We adjust the discrete value by 0.5 to find the corresponding area under the normal curve.

For $P (X \leq k)$ , use the normal area to the left of $k + 0.5$ .
For $P (X \geq k)$ , use the normal area to the right of $k - 0.5$ .

Example: Suppose a factory produces chips where 10% are defective. In a batch of 200 chips ( $n = 200, p = 0.10$ ), what is the approximate probability of finding 15 or fewer defective chips? First, check conditions: $n p = 20$ and $n (1 - p) = 180$ , both $\geq 10$ . The approximating normal distribution is $N (μ = 20, σ = 200 \times 0.10 \times 0.90 = 18 \approx 4.24)$ . We want $P (X \leq 15)$ . With continuity correction, we find $P (X_{N or ma l} \leq 15.5)$ . Calculate the z-score: $z = (15.5 - 20) /4.24 \approx - 1.06$ . Using the standard normal table, $P (Z \leq - 1.06) \approx 0.1446$ . There is about a 14.5% chance of having 15 or fewer defective chips.

Common Pitfalls

Forgetting to Check Independence: This is the most frequently violated condition. If you sample 20 people without replacement from a small class of 25, the trials are not independent. The binomial distribution does not apply; you need the hypergeometric distribution. The binomial is appropriate only when the population is at least 10 times the sample size.
Misidentifying n and p: Ensure $p$ is the probability of success for a single, well-defined trial. For example, if 70% of voters support a candidate and you poll 100 voters, $p = 0.70$ (support is a success). Do not confuse $p$ with the probability you are solving for.
Misusing the Normal Approximation: Applying the normal model when $n p$ or $n (1 - p)$ is less than 10 leads to inaccurate results. Also, omitting the continuity correction will introduce a systematic error, especially with smaller $n$ or probabilities near boundaries.
Confusing binompdf and binomcdf: On your calculator, binompdf(n, p, k) computes $P (X = k)$ (exactly $k$ ). binomcdf(n, p, k) computes $P (X \leq k)$ (cumulative, $k$ or fewer). Using the wrong command will yield an answer to a different question.

Summary

The binomial distribution $X \sim Binomial (n, p)$ models the count of successes in a fixed number $n$ of independent trials, each with constant success probability $p$ .
Always verify the BINS conditions—Binary, Independent, Fixed Number, Constant Success—before using the binomial model.
Calculate exact probabilities using the formula $P (X = k) = (k n) p^{k} (1 - p)^{n - k}$ or your calculator's binompdf/binomcdf functions.
The distribution has mean $μ = n p$ and standard deviation $σ = n p (1 - p)$ .
For large samples where $n p \geq 10$ and $n (1 - p) \geq 10$ , you can approximate binomial probabilities with a normal distribution $N (n p, n p (1 - p))$ , remembering to apply a continuity correction of 0.5 when finding probabilities for a range of counts.

AP Statistics: Binomial Distribution

AP Statistics: Binomial Distribution

What Defines a Binomial Experiment?

Calculating Binomial Probabilities

Mean, Standard Deviation, and Shape

The Normal Approximation to the Binomial

Common Pitfalls

Summary

Write better notes with AI