Sampling Distributions and the Central Limit Theorem

Imagine you work in quality control at a factory that fills cereal boxes. Each box is supposed to contain 500 g of cereal. You can’t weigh every single box — there are thousands per hour. So you pull a random sample of 40 boxes, weigh them, and compute the average.

Your sample mean comes out to 497 g. Should you sound the alarm? Is the machine underfilling? Or is 497 g just normal random variation you’d expect to see even when the machine is working perfectly?

To answer that question, you need to understand something that isn’t obvious at first: the average of a random sample is itself random. If you took a different sample of 40 boxes, you’d get a slightly different average. Take 1,000 different samples, and you’d get 1,000 different averages — and those averages would follow their own distribution.

That distribution — the distribution of all possible sample means — is called the sampling distribution. And one of the most remarkable theorems in all of statistics, the Central Limit Theorem, tells us exactly what shape that distribution has, and how to use it to make decisions.

After this lesson, you will be able to:

Explain what a sampling distribution is and how it differs from a population distribution
State the parameters of the sampling distribution of : its mean and its standard error
Apply the Central Limit Theorem and state the conditions under which it holds
Explain how increasing sample size affects the shape and spread of the sampling distribution
Compute probabilities for sample means by standardizing:

This is Lesson INF-1 — the first lesson in the Statistical Inference module. Everything that follows (confidence intervals, hypothesis tests, regression inference) builds directly on what you learn here. Take your time with it.

Statistical inference is built on the patterns of variability you learned in Descriptive Statistics (Module 1).

From Module 1: Mean () and Standard Deviation (). These summaries of a single sample are the raw ingredients for inference.
From Module 1: Normal Distribution and Z-Scores. The “bell curve” shape and the concept of “distance from the mean in standard deviations” are the primary tools for the CLT.
Probability Basics (Module 2): Specifically, the idea that a probability is a long-run frequency.
Calculations: You will need to divide by . Remember that as increases, also increases, which makes the standard error smaller.

Retrieval Checkpoint

A population has a mean and a standard deviation . You take a sample and find a sample mean . What is the z-score of this sample mean relative to the population?

Success Factor:

Important note on σ vs. s in this lesson: Throughout INF-1, we always use σ (population standard deviation) — we assume the population SD is known. In real life this is rare, but it keeps the focus on understanding the CLT itself. The case where σ is unknown (and we must use s) is covered in inf-3, where the t-distribution enters.

Retrieval Warm-up — from earlier lessons

A dataset has values 4, 7, 7, 10, 12. The population standard deviation for this group is . A classmate says “the standard deviation tells us how far a typical sample mean is from the population mean.” Is this statement correct?

C1 — The Sampling Distribution of

Here’s the key idea, stated plainly: a sample mean is a random variable.

When you take a random sample of size from a population and compute , the value you get depends on which observations happened to land in your sample. A different random sample would give a different . Because varies from sample to sample, it has a distribution.

Sampling Distribution of the Sample Mean

The sampling distribution of is the probability distribution of all possible values of that could arise from drawing all possible random samples of size from a population.

Think of it as: if we could repeat the sampling process infinitely many times, each time computing the sample mean, what would the histogram of those means look like?

This is different from the population distribution (which describes individual observations) and from the sample distribution (the data in one specific sample). The sampling distribution lives at a higher level — it describes the behaviour of the statistic across many repetitions.

The sampling distribution is not the distribution of the data in your sample. It is the distribution of values you would get if you repeated your study many times. You never actually observe the sampling distribution directly — it is a theoretical construct that tells you how reliable your estimate is.

Population:

Sample size n = 30

① Population Distribution Fixed — theoretical

All possible individual values. This is what you're sampling from.

μ = 50, σ = 15

② Your Sample (n = 30) Changes each draw

One specific dataset you collected. Not the population — just a snapshot of it.

x̄ = —

③ Sampling Distribution of x̄ Fixed — theoretical

Not your data. The theoretical distribution of all possible x̄ values — if the study were repeated infinitely. You never observe this directly.

Theoretical: x̄ ~ N(50, SE²)

Figure: Three distinct objects. Click ↺ Draw New Sample repeatedly — Panel ② changes every time while Panels ① and ③ remain fixed. The sampling distribution (Panel ③) is a theoretical construct, not data. Notice how much narrower it is than the population.

Use the visualization above to see all three objects side by side. Click ↺ Draw New Sample repeatedly — Panel ② changes every time while Panels ① and ③ stay fixed. Try switching to a Right-Skewed population and notice Panel ③ is still a smooth, narrow bell — that is the CLT at work.

C2 — Mean and Standard Error

The sampling distribution has two key parameters. The first is its center — where does the distribution of sample means land on average?

Mean of the Sampling Distribution

The mean of the sampling distribution equals the population mean . This says that is an unbiased estimator of : on average, sample means hit the target.

This makes intuitive sense — if you average many sample averages, you should get close to the true population average. The remarkable thing is that this is exactly true, not approximately.

The second parameter is the spread — how much do sample means vary around ?

Standard Error of the Mean

The standard error (SE) of the mean measures the typical distance between a sample mean and the population mean . It is not the same as the population standard deviation — it shrinks as sample size increases.

Notice the in the denominator. A sample of has a standard error one-tenth that of a sample of . Larger samples give sample means that cluster more tightly around .

The most common error in this lesson: Using (population SD) instead of (standard error) when computing probabilities for sample means. The formula applies to a single observation. For a sample mean, you must use . Using where belongs makes your z-score times too small — a significant error.

Individual observation — spread = σ = 15 Sample mean x̄ — spread = SE = σ/√n = 15.00

At n = 1, SE = σ — both curves are identical.

Sample size n = 1 SE = 15/√1 = 15.00

Figure: The blue curve (σ = 15) never changes — it describes individual observations. The orange curve narrows as n increases, showing how sample means cluster more tightly. Drag the slider to n = 100 to see the full contrast. To halve the SE, you must quadruple n.

Drag the slider to n = 9, 25, 100 and watch the orange curve (sample means) compress while the blue curve (individuals) stays fixed. At n = 100, a sample mean that deviates from μ by 5 units would be extremely unusual — but for an individual, a 5-unit deviation is ordinary.

C3 — The Central Limit Theorem

We’ve established that has mean and standard error . But what shape does its distribution have? This is where the Central Limit Theorem comes in — and it’s one of the most powerful results in all of statistics.

The Central Limit Theorem (CLT)

Let be a random sample from a population with mean and finite standard deviation . Then, for sufficiently large , the sampling distribution of the sample mean is approximately normal:

Three conditions must hold:

Independence: Observations must be independent of one another — satisfied by random sampling from a large population, or sampling with replacement.
Sample size: for non-normal populations (rule of thumb). If the population is already approximately normal, any produces an exactly normal sampling distribution.
Finite variance: The population must have a finite standard deviation .

What makes the CLT remarkable is what it does not require: the population does not need to be normally distributed. The population could be skewed, bimodal, uniform — anything. As long as the sample is large enough ( as a rule of thumb), the distribution of converges to normal.

This is why the normal distribution appears everywhere in statistics: even when individual observations are non-normal, averages tend to be normal.

The CLT says the sampling distribution of is approximately normal — it says nothing about the distribution of individual observations becoming more normal as grows. The data in your one sample still has whatever shape the population has. It is only the distribution of the statistic (across many hypothetical samples) that becomes normal.

The rule guards against non-normality of the population — it does not rescue a violated independence condition. If your data are a time series, a cluster sample, or repeated measures on the same individuals, the CLT may not apply regardless of how large is. Independence must be verified separately, and no amount of additional data can substitute for it.

Figure 1: Sampling Distribution Builder — draw samples from the population (left) and watch the distribution of sample means grow on the right. Notice how the histogram becomes bell-shaped as more samples are added, regardless of the population shape.

Use the visualization above to build intuition: select a right-skewed population and . The histogram of sample means is also skewed. Now increase to and draw 100 samples — the histogram takes on a bell shape even though the population is still skewed. That’s the CLT in action.

See it happen. Use the simulation below to watch the sampling distribution of form. Switch from a right-skewed population to a normal one — and notice that the sampling distribution looks normal in both cases when is large. Try reducing to 2 and increasing it to 100 to see how the spread changes.

Population shape:

Sample size n = 30

Population Distribution

μ = —, σ = —

Sampling Distribution of x̄ (0 samples drawn)

x̄ avg = —, SE = σ/√n = —

Figure: CLT Simulation — draw samples from a non-normal population and watch the sampling distribution of x̄ converge to a bell curve. The dashed orange curve is the theoretical normal approximation (appears after 30 samples). Adjust n to see how standard error changes.

C4 — Effect of Sample Size on the Sampling Distribution

Two things happen as you increase sample size :

The standard error shrinks: decreases. Sample means cluster more tightly around .
The normal approximation improves: Even for very non-normal populations, larger samples produce a more bell-shaped distribution of .

A concrete example: suppose .

Sample Size	Standard Error	Interpretation
		No averaging benefit — same spread as population
		SE is one-third of σ
		SE is one-sixth of σ
		SE is one-twelfth of σ

Sample size n = 1 SE = 12.00

Figure: SE = σ/√n as a function of sample size. Drag the slider to see how SE falls quickly at first, then levels off — doubling n only halves SE if you quadruple it.

Notice the diminishing returns: doubling only reduces SE by a factor of . To halve the SE, you need to quadruple . This is why there’s a cost-benefit tradeoff in sampling design.

C5 — Computing Probabilities for

Once the CLT tells us the sampling distribution is approximately normal, we can use the z-table to compute probabilities. The only adjustment from PR-6 is the denominator in the z-score formula.

Z-Score for a Sample Mean

This converts a sample mean into a standard normal z-score. The denominator is the standard error , not .

Once you have , use the standard normal table exactly as in PR-6: is the left-tail area.

The workflow for any probability question about :

Compute the standard error:
Compute the z-score:
Look up the probability in the z-table.
If the question asks for a right-tail probability, use the complement: .

Population μ

Std dev σ

Sample size n

Observed x̄

Step 1

SE = σ / √n

—

Step 2

z = (x̄ − μ) / SE

—

Step 3 — Φ(z)

Step 4 — Answers

P(x̄ < x̄₀) —

P(x̄ > x̄₀) —

Two-tail —

Figure: A live four-step pipeline. Edit any input field to see all four steps update instantly. Default values are from Example 2. The shaded area in Step 3 represents Φ(z) — the left-tail probability.

Retrieval Bridge — Individual Z-Scores

You now know the full workflow for sample means. Before moving to worked examples, confirm you can still apply z-scores to individual observations — the contrast between the two cases is central to Section 4. A standard normal z-table shows . A student scores 82 on an exam where and . What is the probability that a randomly chosen student scores lower than this student?

Example 1 — Parameters of the Sampling Distribution (Fully Worked)

A population has mean and standard deviation . We draw random samples of size .

(a) What are the mean and standard error of the sampling distribution of ?
(b) Is it reasonable to use the normal distribution for probability calculations? Why?

Part (a):

So the sampling distribution of has mean 80 and standard error 3.

Part (b): We need to check the CLT conditions. We’re not told the population shape, so we apply the rule of thumb: is ? Here .

This means the CLT rule of thumb is not met on its own. However, if the population is approximately normal (a common assumption in many real contexts), the sampling distribution is exactly normal for any . In that case, we can proceed. If the population is strongly skewed, we should be cautious with .

Key takeaway: Always check CLT conditions before computing probabilities. State your assumption about the population shape if .

Example 2 — Computing a Probability for (Partially Scaffolded)

Cereal boxes are filled by a machine with g and g. The fill weights are normally distributed. A quality inspector randomly selects boxes.

Find — the probability the sample mean is below 498 g.

Before seeing the solution: which formula for the denominator — or ? Why?

Step 1 — Standard error:

Step 2 — Z-score:

Step 3 — Probability:

Interpretation: If the machine is working correctly (), there is only about a 2.3% chance of observing a sample mean below 498 g. If you actually observe or below, that would be strong evidence the machine is underfilling.

Example 3 — Effect of Sample Size on Probability (Minimally Scaffolded)

Using the same cereal machine ( g, g), compute for samples of size and . Compare the three results (include from Example 2).

Hint: the process is identical to Example 2 — just compute a new SE for each n, then find the new z-score.

Show Solution

For :

For (from Example 2):

For :

Summary:

	SE
16	2.00	−1.00	0.1587
64	1.00	−2.00	0.0228
100	0.80	−2.50	0.0062

As increases, the SE shrinks, the z-score magnitude grows, and the probability of being far from drops sharply. Larger samples make it much easier to detect a real deviation from target.

Example 4 — Individual Observation vs. Sample Mean (Application Twist)

This problem looks like the previous ones, but there’s a twist. Read carefully.

A student exam score in a large course is normally distributed with and .

(a) What is the probability that a single randomly selected student scores above 78?
(b) What is the probability that the average score of a randomly selected class of 25 students is above 78?

Show Solution

Part (a) — single observation: Use (not SE).

Part (b) — sample mean of 25: Use SE.

Contrast: A single student scoring above 78 is moderately likely (27%). But a class average above 78 is extremely unlikely (0.13%) — because the average of 25 scores is far less variable than any individual score. This is the power (and the purpose) of the SE formula.

Individual score X

P(X > 78) = —

Sample mean x̄ of n scores

P(x̄ > 78) = —

Sample size n = 1

Figure: Same cutoff (78), same μ = 72, σ = 10. The blue curve (individual) never changes — P(X > 78) ≈ 27%. The orange curve (sample mean) narrows as n grows, making P(x̄ > 78) collapse toward zero. Drag to n = 25 to see the contrast from Example 4.

Problem 1 — Sample-Mean Probability (C2, C5)

Problem 3 — Central Limit Theorem (CLT) Compliance Conditions (C3)

For each scenario below, decide whether the CLT justifies using a normal distribution for the sampling distribution of .

Scenario A: The population of annual incomes in a city is strongly right-skewed (a few very high earners). You draw a sample of .

Scenario B: Test scores in a class of 15 students follow a roughly normal distribution. You record all 15 scores.

Scenario C: A population of daily wait times has an extreme bimodal shape (clustered near 2 min or 20 min, nothing in between). Sample size is .

Problem 4 — Comparative Analysis of Sample Size Effects (C4, C5)

Problem 1 — When Does the CLT Apply? (C1 + C3)

A heavily right-skewed population is sampled randomly with . What follows for ?

A normal population is sampled with . What follows for ?

A right-skewed population is sampled with . Is the normal approximation reliable?

A bimodal population is sampled randomly with . What follows for ?

An extremely right-skewed population is sampled with . What is the best conclusion?

Problem 2 — Sampling Distribution Probability (C2 + C5)

Problem 3 — Comparing Sample Sizes (C2 + C4 + C5)

Problem 4 — Probability Across Population Shapes (C3 + C5)

Problem 5 — Individual Observation vs. Sample Mean (C1 + C5)

Mixed Review — Retrieval from Earlier Lessons

These problems draw on concepts from earlier in the course. Attempting them without re-reading prior lessons is the point — retrieval practice strengthens long-term memory more than re-reading.

Review Problem 1 — Variability and Spread (DS-4)

Review Problem 2 — Normal Distribution Probability (PR-6)

No hints, no dropdowns — these questions measure genuine understanding.

Question 1 — Feynman Test

A classmate who missed this lesson asks you: “What is the Central Limit Theorem and why does it matter? What conditions does it need?”

Write your explanation below as if you were actually talking to them (complete sentences, plain language). Address all three prompts in your response:

What is the sampling distribution of ? (1–3 sentences)
What does the CLT guarantee, and what three conditions must hold? (2–4 sentences)
Why does this matter for real statistical work? (1–2 sentences)

0 words

See a model answer

The CLT says that if you take many random samples from any population (with finite mean and variance) and calculate the sample mean each time, those means will form an approximately normal distribution — regardless of what the original population looks like.

Three conditions must hold: (1) observations are independent of one another (satisfied by random sampling); (2) the sample size is large enough ( for non-normal populations, any if the population is already normal); (3) the population has a finite standard deviation . When all three hold, is approximately .

This matters because it lets us compute probabilities for sample means using the normal table — even when we have no idea what shape the original data follows.

Question 2 — CLT Applicability

A population has and . A sample of is drawn from a right-skewed population. Can you use the normal distribution for ?

Question 3 — Sampling Probability

Question 4 — Error Analysis

Question 5 — Sampling Distribution Claim (C1)

Question 6 — Sample Size and Probability (C4)

Self-Assessment

How confident are you with the concepts from this lesson?

Still confusedReady for the Boss Fight

A quality-control team has one decision to make. Choose a recommendation before opening any diagnostic steps; if you need them, the steps will help you trace the reasoning after your first attempt.

Optional stretch. These go beyond the lesson objectives. Attempt them when you feel confident with the core material.

Challenge 1 — Finding Minimum Sample Size

A population has . You want the standard error of the mean to be at most 3. What is the minimum sample size required? Also verify whether the CLT applies.

Show Solution

We need . Solving: , so .

Minimum: . Since 64 ≥ 30, CLT conditions are met.

Worked Enrichment — The Finite Population Correction (FPC)

The standard-error formula assumes an effectively infinite population. When a sample covers a meaningful share of a small population, the standard error is smaller:

For , , and , the sampling fraction is . Without the correction, ; with it, .

Challenge 3 — Why Averaging Reduces Variability: A Proof Sketch

The formula follows from basic properties of variance. Here’s the argument — answer the two questions below.

Let be independent draws from a population with variance . Then:

. Which property lets you bring the outside the variance?

After extracting , we have . What allows us to write ?

See the complete derivation

Because the are independent:

Taking the square root: .

This is the exact formula for the standard error — derived from first principles, not memorized. The key ingredients are (1) the variance scaling rule and (2) independence of the observations.

Complete, step-by-step solutions for all problems in Sections 5–9 are available on the solutions page. Solutions include worked arithmetic, common mistakes to watch for, and interpretation guidance.

View Full Solutions →

If you’re stuck: Re-read the relevant Core Concept in Section 3, then find the Worked Example that maps to that concept (e.g., Example 1 maps to Concept 1). The solutions page shows the reasoning behind every step, not just the final answer.

Quick-Reference Formulas

Mean of the Sampling Distribution:

Standard Error of the Mean:

Z-Score for a Sample Mean:

Population Shape	When is the sampling distribution normal?
Normal	Always (any sample size )
Unknown or Skewed	When (by the Central Limit Theorem)

INF-1: Sampling Distributions and the Central Limit Theorem

Section 1: Introduction

Section 2: Prerequisites

Section 3: Core Concepts

C1 — The Sampling Distribution of

Sampling Distribution of the Sample Mean

C2 — Mean and Standard Error

Mean of the Sampling Distribution

Standard Error of the Mean

C3 — The Central Limit Theorem

The Central Limit Theorem (CLT)

C4 — Effect of Sample Size on the Sampling Distribution

C5 — Computing Probabilities for

Z-Score for a Sample Mean

Section 4: Worked Examples

Example 1 — Parameters of the Sampling Distribution (Fully Worked)

Example 2 — Computing a Probability for (Partially Scaffolded)

Example 3 — Effect of Sample Size on Probability (Minimally Scaffolded)

Example 4 — Individual Observation vs. Sample Mean (Application Twist)

Section 5: Guided Practice

Problem 1 — Sample-Mean Probability (C2, C5)

Problem 3 — Central Limit Theorem (CLT) Compliance Conditions (C3)

Problem 4 — Comparative Analysis of Sample Size Effects (C4, C5)

Section 6: Independent Practice

Problem 1 — When Does the CLT Apply? (C1 + C3)

Problem 2 — Sampling Distribution Probability (C2 + C5)

Problem 3 — Comparing Sample Sizes (C2 + C4 + C5)

Problem 4 — Probability Across Population Shapes (C3 + C5)

Problem 5 — Individual Observation vs. Sample Mean (C1 + C5)

Mixed Review — Retrieval from Earlier Lessons

Review Problem 1 — Variability and Spread (DS-4)

Review Problem 2 — Normal Distribution Probability (PR-6)

Section 7: Mastery Check

Question 1 — Feynman Test

Question 2 — CLT Applicability

Question 3 — Sampling Probability

Question 4 — Error Analysis

Question 5 — Sampling Distribution Claim (C1)

Question 6 — Sample Size and Probability (C4)

Self-Assessment

Section 8: Boss Fight

Section 9: Challenge Problems

Challenge 1 — Finding Minimum Sample Size

Worked Enrichment — The Finite Population Correction (FPC)

Challenge 3 — Why Averaging Reduces Variability: A Proof Sketch

Section 10: Solutions Reference

Quick-Reference Formulas