Probability Theory (Graduate)
Probability Theory (Graduate)
Graduate probability theory is the study of randomness built on a rigorous mathematical foundation. Instead of treating probabilities as intuitive fractions or long-run frequencies, the graduate approach formalizes random phenomena using measure theory, then develops powerful tools to analyze convergence, dependence, and limiting behavior. This framework underpins modern statistics, stochastic processes, information theory, and much of quantitative economics and finance.
What follows is a structured tour of core topics: measure-theoretic probability, modes of convergence, characteristic functions, martingales, and central limit theorems. The goal is not only to state theorems, but to explain what they accomplish and why they are indispensable.
Measure-theoretic foundations
At the graduate level, probability starts with a probability space :
- is the sample space of outcomes.
- is a -algebra of events (the measurable subsets of ).
- is a probability measure with .
This setup matters because many random variables of interest live on complicated spaces (paths of a process, function spaces, infinite sequences). Measure theory provides the correct language for defining probability in these settings without contradictions.
A random variable is a measurable map , where is the Borel -algebra. Measurability ensures events like belong to , so probabilities like are well-defined.
Expectation as an integral
Expectation is defined as a Lebesgue integral: \[ \mathbb{E}[X] = \int_{\Omega} X(\omega)\, d\mathbb{P}(\omega), \] when integrability conditions are satisfied. This view unifies discrete and continuous cases and provides precise tools for interchanging limits and integrals, which is central to convergence and limit theorems.
Two indispensable results are:
- Monotone Convergence Theorem (MCT): for , .
- Dominated Convergence Theorem (DCT): if a.s. and with , then .
These are not technical luxuries. They are what make it possible to justify passing limits through expectations when proving laws of large numbers or central limit theorems.
Conditional expectation
Conditional expectation is treated as a random variable rather than a number. For an integrable and a sub--algebra , the conditional expectation is the -measurable random variable satisfying \[ \intG \mathbb{E}[X\mid \mathcal{G}]\, d\mathbb{P} = \intG X\, d\mathbb{P} \quad \text{for all } G\in\mathcal{G}. \] This definition is the gateway to martingales and modern stochastic analysis, because it formalizes “best prediction given current information.”
Modes of convergence
Probability theory distinguishes several notions of convergence because random variables can converge in different senses, with different implications.
Almost sure, in probability, and in
Common modes include:
- Almost sure convergence: a.s. if .
- Convergence in probability: in probability if for all , .
- __MATH_INLINE_30__ convergence: in if .
A typical hierarchy is: \[ Xn \to X \text{ in } L^p \Rightarrow Xn \to X \text{ in probability}, \] and almost sure convergence implies convergence in probability, but not vice versa in general.
These distinctions are not pedantry. For example, laws of large numbers can often be proved in probability first, then strengthened to almost sure convergence under additional conditions.
Weak convergence and distributional limits
A separate concept is convergence in distribution (weak convergence), written . It means the distribution functions converge at continuity points of the limit. Weak convergence is the natural framework for central limit theorems, because the normalized sums often do not converge pointwise or in , but their distributions stabilize.
A key practical question in graduate probability is: Which convergence is needed for the result I want? In statistics, convergence in distribution supports asymptotic approximations; in stochastic processes, almost sure statements can be crucial for pathwise properties.
Characteristic functions
A characteristic function is the Fourier transform of a probability distribution: \[ \varphi_X(t) = \mathbb{E}[e^{itX}], \quad t\in \mathbb{R}. \] Characteristic functions always exist (unlike moment generating functions) and encode the full distribution. Two cornerstone facts:
- Uniqueness: implies and have the same distribution.
- Continuity theorem: pointwise with continuous at implies for some with characteristic function .
Characteristic functions are especially effective for sums of independent random variables since \[ \varphi{X+Y}(t) = \varphiX(t)\varphiY(t) \] when MATHINLINE45 and MATHINLINE46_ are independent. This multiplicative structure turns limit theorems into analytic problems about convergence of functions.
Martingales and information over time
A martingale formalizes a fair game. Let be a filtration representing information revealed over time. A process is a martingale if:
- is -measurable,
- ,
- .
Martingales show up naturally by taking conditional expectations, constructing partial sums of mean-zero increments, or modeling price processes under “no arbitrage” assumptions in finance.
Martingale inequalities and stopping
Martingale methods provide robust bounds and convergence results. Doob-type inequalities bound maximal fluctuations, which is crucial for proving almost sure convergence. Optional stopping results clarify when a stopped martingale preserves expectation, and when it fails due to integrability or unbounded stopping times.
One of the most useful qualitative conclusions is that martingale structure often yields convergence without independence, making it a powerful generalization of classical tools.
Limit theorems: laws of large numbers and central limit behavior
Limit theorems explain how complex random systems exhibit regular behavior at scale.
Laws of large numbers
A typical law of large numbers states that for i.i.d. with , \[ \frac{1}{n}\sum{k=1}^n Xk \to \mu, \] in an appropriate sense (often almost surely in the strong law, in probability in the weak law). The measure-theoretic toolkit (MCT, DCT, truncation arguments) and concentration inequalities are standard techniques for proving these results under minimal assumptions.
Central limit theorems (CLTs)
The central limit theorem explains why normal distributions appear ubiquitously. In its classical form, for i.i.d. variables with mean and variance , \[ \frac{\sum{k=1}^n Xk - n\mu}{\sigma\sqrt{n}} \Rightarrow Z, \] where is standard normal.
At the graduate level, the CLT is not a single theorem but a family of results under varying assumptions. One learns to handle:
- Triangular arrays (non-identically distributed summands).
- Lindeberg-type conditions controlling the contribution of large summands.
- Lyapunov-type conditions formulated via moments.
Characteristic functions provide a clean route to CLTs, but martingale CLTs and other approaches extend the theory to dependent data, which matters in time series, random walks, and stochastic algorithms.
How the pieces fit together
Measure-theoretic probability supplies the