AP Statistics: Sampling and Experimentation
AP Statistics: Sampling and Experimentation
Sampling and experimentation sit at the center of AP Statistics because they determine what conclusions you are allowed to draw from data. A beautifully computed confidence interval is not persuasive if the sample is biased. A statistically significant result is not trustworthy if the experiment is riddled with confounding variables. Good study design is the difference between describing a dataset and learning something real about a population or a cause-and-effect relationship.
This article lays out the essential ideas: how to sample, how bias sneaks in, how observational studies differ from experiments, and how randomization, blocking, and careful control protect your conclusions.
Populations, samples, and why we sample
A population is the full group you want to learn about, such as all registered voters in a state or all devices produced on an assembly line this month. A sample is the subset you actually measure.
We sample because measuring everyone is often too expensive, too slow, or impossible. The goal is to use the sample to make inferences about the population, but that only works when the sample represents the population well. Representation is not about matching “common sense” expectations; it is about giving the population a fair chance to appear in the data.
Random sampling and common sampling methods
A random sample gives each individual in the population a known chance of selection. Random does not mean “haphazard.” It means governed by a chance mechanism, like a random number generator.
Simple random sample (SRS)
In a simple random sample of size , every possible group of individuals has an equal chance of being selected. SRS is a gold standard because it minimizes systematic favoritism.
Practical example: If a school has a roster of 1,200 students and you want an SRS of 60, label students 1 to 1200 and use a random number generator to select 60 distinct labels.
Stratified random sample
A stratified sample divides the population into meaningful subgroups (strata) and then takes an SRS from each stratum. This is useful when the strata are known to differ in ways related to the outcome.
Example: Surveying student satisfaction might require separate strata for grade levels. If freshmen and seniors experience school differently, stratification ensures each grade is represented, often improving precision.
Cluster sample
A cluster sample divides the population into clusters, randomly selects clusters, and then measures everyone in the selected clusters. Clusters are typically naturally occurring groups.
Example: To study household internet speed in a city, you might randomly select a set of city blocks (clusters) and test every household on those blocks. This is cost-effective but can be less accurate if clusters are internally similar.
Systematic sample (with caution)
A systematic sample selects every th individual after a random start. It can be close to random and easy to implement, but it can fail badly if there is hidden periodicity.
Example: Inspect every 20th bottle off a conveyor belt. If the machine has a repeating defect pattern every 20 bottles, the method can completely miss the problem.
Bias: how sampling goes wrong
Bias is a systematic tendency to overestimate or underestimate a true value. The key word is systematic; bias does not “average out” with a larger sample size.
Common sources of bias include:
- Undercoverage: Some groups are not adequately represented in the sampling frame. A poll that reaches only landline phones undercovers younger voters.
- Nonresponse bias: People who refuse or fail to respond differ meaningfully from those who respond. A long online survey about health may attract more health-conscious participants.
- Response bias: The way questions are asked or answered distorts results. Leading questions and social desirability are frequent culprits.
- Voluntary response bias: People choose to participate, often because they have strong opinions. Online comment polls are classic examples.
A crucial AP Statistics idea is that sample size cannot fix a biased design. A large biased sample is still biased; it just produces a very precise wrong answer.
Observational studies vs experiments
AP Statistics draws a sharp line between association and causation, and study type is what determines which claims are justified.
Observational studies
In an observational study, researchers observe individuals and measure variables without assigning treatments. You can identify relationships, compare groups, and build predictive models, but you cannot confidently claim that one variable causes another because lurking variables may explain the association.
Example: If students who sleep more tend to have higher GPAs, that is an association. It may be that good time management increases both sleep and GPA, acting as a lurking variable.
Experiments
In an experiment, researchers impose a treatment on individuals and observe the response. Well-designed experiments can support cause-and-effect conclusions because they actively control and randomize treatment assignment, reducing the impact of confounding variables.
Example: Randomly assign students to use one of two study apps for four weeks, then compare exam scores. If the design is solid, differences in scores can be attributed to the apps more credibly.
Elements of a well-designed experiment
The strongest experiments are not complicated; they are disciplined. AP Statistics emphasizes three core principles: control, randomization, and replication.
Treatments, experimental units, and response variables
- Experimental units are the individuals or objects being studied. If units are people, they are often called subjects.
- Treatments are the specific conditions applied, such as “caffeine” vs “no caffeine,” or different dosage levels.
- The response variable is the outcome measured, like reaction time or blood pressure.
Clear operational definitions matter. “Stress” must be defined as a measurable variable, not a vague idea.
Random assignment and why it matters
Random assignment places experimental units into treatment groups using a chance process. This is different from random sampling. Random sampling helps you generalize to a population; random assignment helps you make causal claims within the experiment.
Random assignment works because, in the long run, it balances both known and unknown variables across groups. That balance reduces confounding.
Control and placebos
A control group provides a baseline for comparison. In many settings, a placebo is used so that participants in the control group undergo a similar experience without receiving the active treatment.
Placebos matter because expectations can change outcomes. If participants know they are receiving a new treatment, improved results may reflect belief rather than the treatment itself.
Blinding
- Single-blind: subjects do not know which treatment they receive.
- Double-blind: neither subjects nor the researchers interacting with them know, reducing bias in measurement and behavior.
Blinding is especially important when outcomes involve judgment, self-reporting, or subtle measurement decisions.
Replication
Replication means applying treatments to many experimental units. More replication reduces random variation and makes effects easier to detect. Replication is not repeating the entire experiment once; it is having sufficient sample size within each group.
Blocking, confounding, and reducing variability
Confounding variables
A confounding variable is associated with both the explanatory variable (treatment) and the response, making it difficult to separate the treatment’s effect from the confounder’s effect.
Example: If one teacher uses Method A in the morning and another teacher uses Method B in the afternoon, any performance difference could be due to method, teacher, time of day, or student composition. The effects are confounded.
Random assignment helps prevent confounding by balancing these influences, but design choices can also introduce confounding when treatments are tangled with other factors.
Blocking
Blocking is a design technique that groups similar experimental units together (blocks) and then randomizes treatments within each block. Blocking reduces variability and improves the precision of comparisons.
Example: In a study of a new fertilizer, you might block by field section if soil quality varies across the farm. Within each section, randomly assign fertilizer types. This ensures comparisons are made among plots with similar soil conditions.
Blocking is not the same as stratified sampling, though the logic is similar. Stratification is for sampling; blocking is for experiments.
Matched pairs
A matched pairs design is a special form of blocking with pairs. Common versions include:
- Each subject receives both treatments in random order (a crossover design), with appropriate time for washout when needed.
- Subjects are paired based on similarity (like twins), and treatments are randomized within each pair.
Matched pairs can be powerful because each comparison is made within a pair, controlling for pair-level differences.
What your conclusions can and cannot say
Two separate questions should guide your interpretation:
- Can I generalize to a population?
You need random sampling from a well-defined population, with minimal bias.
- Can I claim causation?
You need an experiment with random assignment, good control, and minimal confounding.
An experiment conducted on a convenience sample may allow causal claims about those participants but not broad generalizations. A random sample in an observational study may allow generalization about the population but not causal statements.
Practical checklist for AP-style study design
- Define the population and sampling frame clearly.
- Choose a sampling method that fits logistics while protecting representativeness.
- Identify likely sources of bias and address them directly.
- If causation is the goal, use random assignment and a control group.
- Consider blocking on major known sources of variation.
- Plan for replication and consistent measurement of the response.
Sampling and experimentation are not separate topics; they are the foundation that makes statistical inference meaningful. When you can diagnose bias, distinguish