IB AI: Probability Distributions in AI

Probability is the language of uncertainty, and artificial intelligence thrives on quantifying uncertainty. Whether assessing the reliability of a machine learning model's predictions, evaluating the quality of a manufactured component, or simulating random events within an algorithm, AI systems rely on formal probability distributions to make sense of the world. In IB AI, mastering two foundational distributions—the discrete binomial and the continuous normal—provides you with the mathematical toolkit to model, analyze, and make informed decisions from data.

The Binomial Distribution: Modeling Fixed-Trial Success

The binomial distribution is your go-to model for scenarios involving a fixed number of independent trials, where each trial has only two possible outcomes: success or failure. Imagine you are testing a new image recognition algorithm by showing it 20 pictures. For each picture, the algorithm either correctly identifies the object (success, with a constant probability $p$ ) or fails to do so (failure, with probability $1 - p$ ). The outcome of one test does not influence the next.

Formally, if $X$ is the random variable representing the number of successes in $n$ trials, then $X$ follows a binomial distribution, denoted $X \sim B (n, p)$ . The probability of getting exactly $k$ successes is given by:

$P (X = k) = (k n) p^{k} (1 - p)^{n - k}$

Here, $(k n)$ is the binomial coefficient, calculated as $\frac{n !}{k ! ( n - k )!}$ .

Two key summary statistics define any distribution. The expected value, or mean, tells you the average number of successes you can anticipate. For a binomial distribution, it is intuitively the number of trials multiplied by the probability of success: $E (X) = n p$ . The standard deviation measures the spread or variability around that mean, calculated as $σ = n p (1 - p)$ . If your algorithm has a 95% accuracy rate ( $p = 0.95$ ) and you test it on 100 images, you expect $E (X) = 100 \times 0.95 = 95$ correct identifications, with a typical variation of $σ = 100 \times 0.95 \times 0.05 \approx 2.18$ images.

The Normal Distribution: The Ubiquitous Bell Curve

While the binomial distribution counts discrete successes, many phenomena in AI and nature—like measurement errors, human heights, or the average scores from many trials—are modeled by the continuous normal distribution. Its signature bell curve is symmetric and defined entirely by its mean ( $μ$ ) and standard deviation ( $σ$ ), denoted $X \sim N (μ, σ^{2})$ .

The curve's shape follows a specific probability density function. Crucially, approximately 68% of data lies within one standard deviation of the mean ( $μ \pm σ$ ), 95% within two standard deviations ( $μ \pm 2 σ$ ), and 99.7% within three ( $μ \pm 3 σ$ ). This makes it powerful for identifying outliers. For instance, if a sensor's readings are normally distributed, a reading more than three standard deviations from the mean might signal a malfunction.

To use standard probability tables or your calculator, you often need to standardize a normal variable. This process converts any normal distribution $X \sim N (μ, σ^{2})$ into the standard normal distribution $Z \sim N (0, 1)$ using the z-score formula:

$Z = \frac{X - μ}{σ}$

A z-score tells you how many standard deviations a particular value $x$ is from the mean. If an AI model's processing time is normally distributed with $μ = 50$ ms and $σ = 5$ ms, a time of 60 ms corresponds to $Z = (60 - 50) /5 = 2$ . This value is two standard deviations above the mean.

Using Your GDC for Probability Calculations

Your graphing display calculator (GDC) is indispensable for efficient probability calculations in exams and projects. You must know which menu functions to use for each distribution.

For the binomial distribution $B (n, p)$ :

To find $P (X = k)$ : Use the binompdf (binomial probability density function) command.
To find $P (X \leq k)$ : Use the binomcdf (binomial cumulative distribution function) command. For probabilities like $P (X > k)$ or $P (a \leq X \leq b)$ , you will use this cumulative function with complementary rules (e.g., $P (X > k) = 1 - P (X \leq k)$ ).

For the normal distribution $N (μ, σ)$ :

To find $P (a < X < b)$ : Use the normalcdf (normal cumulative distribution function) command. You input the lower bound, upper bound, mean, and standard deviation.
To find the value $x$ such that $P (X < x) = p$ : Use the invNorm (inverse normal) command. You input the area (probability) to the left, the mean, and the standard deviation.

Always sketch a quick diagram of the distribution, shade the area representing the probability you want, and double-check that your bounds make sense. This prevents simple input errors.

Applying Distributions to AI: Quality Control and Testing

These theoretical models find direct application in core AI development and maintenance tasks, particularly in quality control and testing.

Binomial Applications: Consider A/B testing a user interface. You show the new design (Version B) to 100 randomly selected users. Your success metric might be "user completed the task." If the underlying true success probability for Version B is $p$ , the number of observed successes is binomial. You can calculate the probability of seeing a result as extreme as your data to determine if Version B is statistically better than Version A. Similarly, in algorithmic testing, you can model the number of passed test cases out of a suite of $n$ independent tests to quantify confidence in a software release.

Normal Applications: In manufacturing AI hardware, component dimensions (like chip thickness) often follow a normal distribution due to natural process variation. Quality control uses the empirical rule to set tolerance limits. If a component's width is $N (10.0 mm, 0. 1^{2})$ , then widths outside $10.0 \pm 3 \times 0.1$ (9.7mm to 10.3mm) occur less than 0.3% of the time. Finding several such outliers could trigger a machine calibration check. Furthermore, when you take large samples, the sampling distribution of means (like the average accuracy of your model across multiple test runs) tends toward normality—a principle central to statistical inference in AI performance evaluation.

Common Pitfalls

Misapplying the Binomial Conditions: The binomial distribution requires independence, a fixed number of trials ( $n$ ), and a constant probability of success ( $p$ ). A common mistake is using it when $p$ changes. For example, if you are testing a model that learns and improves after each trial, the trials are not independent, and the binomial model is invalid.

Correction: Always verify the four BINS conditions: Binary outcomes, Independent trials, fixed Number of trials, and constant probability of Success.

Confusing pdf and cdf Commands: Using binompdf when you need a cumulative probability (e.g., $P (X \leq 5)$ ) will give you only $P (X = 5)$ , a much smaller number.

Correction: Remember: pdf is for an exact number (=), cdf is for a range (≤). For normal distributions, you almost always use normalcdf for finding probabilities.

Forgetting to Standardize or Using Wrong Parameters: When using invNorm or normalcdf on a non-standard normal distribution, you must input the correct $μ$ and $σ$ . Simply using $N (0, 1)$ parameters will yield an incorrect answer.

Correction: Write down $X \sim N (μ, σ)$ at the start of the problem. Carefully enter these values into your GDC. When calculating a raw score $x$ from a z-score, remember to rearrange the formula: $x = μ + z σ$ .

Treating Discrete as Continuous (and Vice Versa): The binomial distribution is discrete—it only takes whole number values. Sometimes a question will ask for $P (X < 10)$ for a binomial variable. Using a normal approximation without a continuity correction, or incorrectly interpreting the inequality, can cause errors.

Correction: For discrete distributions, $P (X < 10) = P (X \leq 9)$ . Always check whether the inequality is strict (<, >) or inclusive (≤, ≥) and adjust your calculation accordingly.

Summary

The binomial distribution $B (n, p)$ models the count of successes in a fixed number of independent trials with constant success probability. Its mean is $n p$ and its standard deviation is $n p (1 - p)$ .
The normal distribution $N (μ, σ^{2})$ is a continuous, symmetric bell-shaped model for many natural and measurement phenomena, characterized by its mean and standard deviation. The empirical rule (68-95-99.7) is a quick way to estimate probabilities.
Standardization via the z-score formula $Z = (X - μ) / σ$ converts any normal distribution to the standard normal $N (0, 1)$ , enabling the use of probability tables or calculator functions.
Master your GDC's binompdf/binomcdf and normalcdf/invNorm functions. Knowing when to use each is crucial for solving probability problems efficiently.
In AI applications, these distributions are fundamental for quality control (e.g., setting tolerance limits using the normal distribution) and algorithm testing (e.g., modeling pass/fail rates with the binomial distribution).

IB AI: Probability Distributions in AI

IB AI: Probability Distributions in AI

The Binomial Distribution: Modeling Fixed-Trial Success

The Normal Distribution: The Ubiquitous Bell Curve

Using Your GDC for Probability Calculations

Applying Distributions to AI: Quality Control and Testing

Common Pitfalls

Summary

Write better notes with AI