Math AI: Probability and Decision Trees

Probability is not just about rolling dice; it’s the formal language of uncertainty, used to diagnose diseases, price insurance, and guide billion-dollar business decisions. In the IB Mathematics AI course, you move beyond simple chance to master tools for modeling complex, conditional scenarios and making optimal choices, skills essential for navigating a world full of risk and incomplete information.

The Foundation: Probability Trees and Conditional Probability

The simplest way to visualize a sequence of probabilistic events is with a probability tree. This diagram maps out all possible outcomes of a multi-stage process. Each branch represents a possible outcome at a stage, and it is labeled with the probability of that outcome occurring. Crucially, the probabilities on branches emanating from a single node (decision point) must always sum to $1$ .

The power of a probability tree becomes clear when events depend on each other. This introduces conditional probability, denoted $P (A ∣ B)$ , which is read as "the probability of event $A$ given that event $B$ has occurred." It fundamentally changes the sample space. The probability of following a specific path on a tree is found by multiplying the probabilities along the branches, which is an application of the multiplication rule: $P (A and B) = P (A) \times P (B ∣ A)$ .

Example: A medical test for a rare disease (affecting 1% of a population) has a 95% true positive rate ( $P (Positive ∣ Disease) = 0.95$ ) and a 90% true negative rate ( $P (Negative ∣ No Disease) = 0.90$ ). To find the probability a person has the disease and tests positive, you multiply: $P (Disease) \times P (Positive ∣ Disease) = 0.01 \times 0.95 = 0.0095$ . The tree organizes this calculation and all other possible outcomes (e.g., false positives, false negatives) systematically.

Bayes' Theorem: Reversing Conditional Probability

Often, we know $P (Result ∣ Cause)$ but need to find $P (Cause ∣ Result)$ . This is the essence of Bayes' Theorem, a revolutionary formula for updating beliefs with new evidence. It is derived from the definition of conditional probability and the multiplication rule. The theorem states:

$P (A ∣ B) = \frac{P ( B ∣ A ) \times P ( A )}{P ( B )}$

Where $P (B)$ can be found by summing all the ways $B$ can occur: $P (B) = P (B ∣ A) P (A) + P (B ∣ A^{'}) P (A^{'})$ .

Let’s apply it to the medical test example. We know $P (Disease) = 0.01$ and $P (Positive ∣ Disease) = 0.95$ . To find the total probability of a positive test, $P (Positive)$ , we consider both the true positives and false positives: $(0.01 \times 0.95) + (0.99 \times 0.10) = 0.0095 + 0.099 = 0.1085$ .

Now, the most critical question: Given a positive test result, what is the actual probability the patient has the disease? Bayes' Theorem gives the answer: $P (Disease ∣ Positive) = \frac{0.95 \times 0.01}{0.1085} \approx 0.0876$

Despite the "95% accurate" test, a positive result only implies about an 8.76% chance of having the disease because the disease itself is so rare. This counterintuitive result underscores the importance of Bayesian reasoning in fields like medicine, spam filtering, and legal analysis, where base rates matter immensely.

Expected Value: Quantifying Risk and Reward

When a decision has probabilistic outcomes with different values or costs, we use expected value (EV) to summarize the average outcome per decision if it were repeated many times. It is a weighted average of all possible outcomes, where the weights are their probabilities.

The formula for the expected value of a discrete random variable $X$ is: $E (X) = \sum [x_{i} \times P (x_{i})]$ where $x_{i}$ are the possible values and $P (x_{i})$ their probabilities.

Example (Business Investment): A company considers a project with a 70% chance of a \$200,000 profit and a 30% chance of a \$50,000 loss. $E V = (0.70 \times 200, 000) + (0.30 \times - 50, 000) = 140, 000 - 15, 000 = $125, 000$ A positive EV suggests a favorable risk-reward profile on average. However, EV does not account for the variability of risk or a company's specific risk tolerance—a startup might avoid a single project with a high potential loss despite a positive EV.

Decision Trees: Structuring Complex Choices

A decision tree combines probability trees with expected value calculations to model and solve multi-stage decision problems under uncertainty. It includes two types of nodes: decision nodes (squares, where you choose an action) and chance nodes (circles, where probabilistic events occur). The analysis is performed using rollback analysis or backward induction.

You start at the final outcomes on the right and work backwards:

At each chance node, calculate the EV of all branches emanating from it.
At each decision node, choose the branch (decision) that leads to the highest EV. This becomes the EV for that decision node.
Continue rolling back to the initial decision.

Example (Product Launch): A company must decide whether to launch a product now or conduct more market research first (cost: \$20,000). Research can predict "Favorable" or "Unfavorable" demand with certain reliabilities. Based on this prediction, the company can then decide to launch or abandon. By assigning probabilities to research results and subsequent market outcomes (High/Moderate/Low demand) with associated profits/losses, you can build a tree. Rolling back calculates the EV of doing research versus launching immediately. The optimal strategy is the initial decision path with the highest rolled-back EV. This formalizes intuition, forcing you to quantify assumptions about probabilities and payoffs.

Common Pitfalls

Misapplying the Multiplication Rule: The most frequent error is using $P (A) \times P (B)$ for $P (A and B)$ when events are not independent. Always ask if the probability of the second event changes knowing the first occurred. If it does, you must use conditional probability: $P (A) \times P (B ∣ A)$ .

Confusing $P (A ∣ B)$ and $P (B ∣ A)$ : This is the prosecutor's fallacy. Knowing the probability of the evidence given guilt ( $P (Evidence ∣ Guilty)$ ) is high does not mean the probability of guilt given the evidence ( $P (Guilty ∣ Evidence)$ ) is high. Bayes' Theorem explicitly addresses this reversal.

Ignoring Base Rates (Prior Probabilities): As shown in the medical test example, neglecting how common or rare the initial condition ( $P (Disease)$ ) is will lead to wildly incorrect interpretations of test results or data.

Misinterpreting Expected Value: EV is a long-run average, not a prediction for a single trial. A decision with a positive EV can still result in a loss, and one with a negative EV can sometimes win. Good decision-making uses EV as a guide while also considering the range and volatility of possible outcomes (risk assessment).

Summary

Probability trees are indispensable visual tools for mapping out multi-stage random processes and correctly applying the multiplication rule for dependent events via conditional probability ( $P (A ∣ B)$ ).
Bayes' Theorem allows you to update the probability of a hypothesis (like having a disease) based on new evidence (a test result), crucially incorporating the base rate or prior probability.
The expected value of a random variable is the probability-weighted average of all possible outcomes, providing a single metric to summarize the "average" result of a risky decision.
Decision trees combine probabilistic chance nodes with decision nodes, and rollback analysis uses expected value to identify the optimal sequence of choices under uncertainty.
Always be vigilant for the common confusions between joint and conditional probability, and remember that expected value is a guide for repeated decisions, not a guarantee for a single event.

Math AI: Probability and Decision Trees

Math AI: Probability and Decision Trees

The Foundation: Probability Trees and Conditional Probability

Bayes' Theorem: Reversing Conditional Probability

Expected Value: Quantifying Risk and Reward

Decision Trees: Structuring Complex Choices

Common Pitfalls

Summary

Write better notes with AI