Further Statistics: Continuous Distributions and Estimation

Moving beyond counts and discrete events, continuous random variables allow us to model phenomena like time, weight, and measurement—quantities that can take any value within an interval. Mastering this area is crucial for advanced statistical inference, forming the bedrock for estimating population parameters from sample data, a fundamental skill in data science, economics, and scientific research.

The Foundation: Probability Density Functions and Cumulative Distribution Functions

For a continuous random variable $X$ , the probability of it taking any single, exact value is zero. Instead, we describe its behavior using a probability density function (PDF), denoted $f (x)$ . The key principle is that probability is represented by area under the curve of the PDF. For any interval $[a, b]$ , the probability that $X$ lies in that interval is given by the integral: $P (a \leq X \leq b) = \int_{a}^{b} f (x) d x .$ A valid PDF must satisfy two conditions: it is never negative ( $f (x) \geq 0$ for all $x$ ), and the total area under its curve is 1 ( $\int_{- \infty}^{\infty} f (x) d x = 1$ ).

Closely related is the cumulative distribution function (CDF), denoted $F (x)$ . This function gives the probability that $X$ is less than or equal to a specific value $x$ : $F (x) = P (X \leq x) = \int_{- \infty}^{x} f (t) d t .$ The CDF is a non-decreasing function that ranges from 0 to 1. You can move between the PDF and CDF using calculus: the PDF is the derivative of the CDF ( $f (x) = F^{'} (x)$ ), and the CDF is the integral of the PDF.

Expectation, Variance, and Percentiles for Continuous Variables

The concepts of average and spread have direct analogs in the continuous world. The expectation (or mean) of a continuous random variable $X$ with PDF $f (x)$ is defined as: $E [X] = μ = \int_{- \infty}^{\infty} x f (x) d x .$ Think of this as a continuous weighted average, where each value $x$ is weighted by its density $f (x)$ . The variance measures the average squared deviation from the mean and is calculated as: $Va r (X) = σ^{2} = E [(X - μ)^{2}] = \int_{- \infty}^{\infty} (x - μ)^{2} f (x) d x .$ A frequently easier computational formula is $Va r (X) = E [X^{2}] - (E [X])^{2}$ , where $E [X^{2}] = \int_{- \infty}^{\infty} x^{2} f (x) d x$ .

Percentiles are another vital tool. The $p$ th percentile (or the $q$ th quantile, where $q = p /100$ ) is the value $x_{p}$ such that $P (X \leq x_{p}) = p /100$ . You find it by solving the equation $F (x_{p}) = p /100$ for $x_{p}$ . The 50th percentile is the median, a robust measure of center.

The Theory of Estimation: What Makes a Good Estimator?

We rarely know population parameters; we estimate them using sample statistics. A statistic used to estimate a parameter is called an estimator. Not all estimators are created equal, and we judge them by three key properties:

Unbiasedness: An estimator $\hat{θ}$ is unbiased for a parameter $θ$ if its expected value equals the parameter: $E [\hat{θ}] = θ$ . The sample mean $\overset{ˉ}{X}$ is an unbiased estimator for the population mean $μ$ . Bias is the difference $E [\hat{θ}] - θ$ .
Consistency: An estimator is consistent if it converges to the true parameter value as the sample size increases. Formally, as $n \to \infty$ , $P (∣ \hat{θ} - θ ∣ < ϵ) \to 1$ for any small $ϵ > 0$ . A biased estimator can be consistent, but unbiasedness alone does not guarantee consistency.
Efficiency: Among unbiased estimators, the one with the smallest variance is called the efficient (or minimum variance unbiased) estimator. Efficiency means the estimator's values are more tightly clustered around the true parameter, yielding more reliable estimates.

Finding Estimates: The Method of Maximum Likelihood

Maximum likelihood estimation (MLE) is a powerful and general method for finding parameter estimates from data. The core idea is simple: choose the parameter values that make the observed sample data most probable.

The procedure involves these steps:

Write the Likelihood Function: For a continuous distribution with PDF $f (x; θ)$ , and an independent sample $x_{1}, x_{2}, ..., x_{n}$ , the likelihood function is $L (θ) = \prod_{i = 1}^{n} f (x_{i}; θ)$ .
Take the Log to Form the Log-Likelihood: Products are awkward, so we use the natural logarithm: $ℓ (θ) = ln L (θ) = \sum_{i = 1}^{n} ln f (x_{i}; θ)$ .
Differentiate and Solve: Differentiate $ℓ (θ)$ with respect to $θ$ , set the derivative equal to zero, and solve for $θ$ to find the maximum likelihood estimate (MLE), denoted $\hat{θ}$ .

Example: For data modeled as $X \sim Exp (λ)$ (exponential distribution), the PDF is $f (x; λ) = λ e^{- λ x}$ for $x > 0$ . The log-likelihood is $ℓ (λ) = n ln λ - λ \sum x_{i}$ . Differentiating: $\frac{d ℓ}{d λ} = \frac{n}{λ} - \sum x_{i}$ . Setting to zero gives the MLE: $\hat{λ} = \frac{n}{\sum x _{i}} = \frac{1}{x ˉ}$ . This is an intuitive result—the rate parameter is estimated by the reciprocal of the sample mean.

Common Pitfalls

Treating the PDF as a Probability: A common error is to interpret $f (a)$ as $P (X = a)$ . For continuous variables, $P (X = a) = 0$ . The PDF's value is a density; only an area under it represents a probability. Always think in terms of integration.

Confusing Unbiasedness with Consistency: Students often assume an unbiased estimator is automatically good. An estimator can be unbiased but have a huge variance that doesn't decrease with sample size (e.g., using only the first data point to estimate the mean). Consistency is often a more critical long-term property, ensuring improvement with more data.

Incorrectly Applying the Expectation Formula: When calculating $E [g (X)]$ , the formula is $\int g (x) f (x) d x$ , not $g (E [X])$ . For example, $E [X^{2}] \neq = (E [X])^{2}$ . Failing to use the correct integral definition is a frequent mistake in variance calculations.

Algebraic Errors in MLE: The log-likelihood step is crucial. Forgetting to take the log, making errors in differentiating sums of logs, or failing to verify that the critical point is a maximum (e.g., by checking the second derivative) can lead to an incorrect estimate.

Summary

Continuous random variables are modeled using a probability density function (PDF), where probabilities are areas under the curve, and a cumulative distribution function (CDF), which gives $P (X \leq x)$ .
The expectation and variance extend to continuous variables via integration: $E [X] = \int x f (x) d x$ and $Va r (X) = E [X^{2}] - (E [X])^{2}$ .
A good estimator should be unbiased (accurate on average), consistent (improves with more data), and efficient (has minimal variance among unbiased estimators).
The maximum likelihood estimation (MLE) method finds parameter values that maximize the likelihood (or log-likelihood) of the observed sample, providing a powerful and general framework for estimation.
Always remember that for a continuous variable, $P (X = a) = 0$ ; probability is only meaningful over an interval.

Further Statistics: Continuous Distributions and Estimation

Further Statistics: Continuous Distributions and Estimation

The Foundation: Probability Density Functions and Cumulative Distribution Functions

Expectation, Variance, and Percentiles for Continuous Variables

The Theory of Estimation: What Makes a Good Estimator?

Finding Estimates: The Method of Maximum Likelihood

Common Pitfalls

Summary

Write better notes with AI