INF-1: Sampling Distributions and the Central Limit Theorem
Module 3 · Statistical Inference
Section 1: Introduction
▾
Imagine you work in quality control at a factory that fills cereal boxes. Each box is supposed to contain 500 g of cereal. You can’t weigh every single box — there are thousands per hour. So you pull a random sample of 40 boxes, weigh them, and compute the average.
Your sample mean comes out to 497 g. Should you sound the alarm? Is the machine underfilling? Or is 497 g just normal random variation you’d expect to see even when the machine is working perfectly?
To answer that question, you need to understand something that isn’t obvious at first: the average of a random sample is itself random. If you took a different sample of 40 boxes, you’d get a slightly different average. Take 1,000 different samples, and you’d get 1,000 different averages — and those averages would follow their own distribution.
That distribution — the distribution of all possible sample means — is called the sampling distribution. And one of the most remarkable theorems in all of statistics, the Central Limit Theorem, tells us exactly what shape that distribution has, and how to use it to make decisions.
After this lesson, you will be able to:
Explain what a sampling distribution is and how it differs from a population distribution
State the parameters of the sampling distribution of : its mean and its standard error
Apply the Central Limit Theorem and state the conditions under which it holds
Explain how increasing sample size affects the shape and spread of the sampling distribution
Compute probabilities for sample means by standardizing:
This is Lesson INF-1 — the first lesson in the Statistical Inference module. Everything that follows (confidence intervals, hypothesis tests, regression inference) builds directly on what you learn here. Take your time with it.
Section 2: Prerequisites
▾
Statistical inference is built on the patterns of variability you learned in Descriptive Statistics (Module 1).
From Module 1: Mean () and Standard Deviation (). These summaries of a single sample are the raw ingredients for inference.
From Module 1: Normal Distribution and Z-Scores. The “bell curve” shape and the concept of “distance from the mean in standard deviations” are the primary tools for the CLT.
Probability Basics (Module 2): Specifically, the idea that a probability is a long-run frequency.
Calculations: You will need to divide by . Remember that as increases, also increases, which makes the standard error smaller.
Retrieval Checkpoint
A population has a mean and a standard deviation . You take a sample and find a sample mean . What is the z-score of this sample mean relative to the population?
Success Factor:
Important note on σ vs. s in this lesson: Throughout INF-1, we always use σ (population standard deviation) — we assume the population SD is known. In real life this is rare, but it keeps the focus on understanding the CLT itself. The case where σ is unknown (and we must use s) is covered in INF-3, where the t-distribution enters.
Retrieval Warm-up — from earlier lessons
A dataset has values 4, 7, 7, 10, 12. The population standard deviation for this group is . A classmate says “the standard deviation tells us how far a typical sample mean is from the population mean.” Is this statement correct?
Section 3: Core Concepts
▾
C1 — The Sampling Distribution of
Here’s the key idea, stated plainly: a sample mean is a random variable.
When you take a random sample of size from a population and compute , the value you get depends on which observations happened to land in your sample. A different random sample would give a different . Because varies from sample to sample, it has a distribution.
Sampling Distribution of the Sample Mean
The sampling distribution of is the probability distribution of all possible values of that could arise from drawing all possible random samples of size from a population.
Think of it as: if we could repeat the sampling process infinitely many times, each time computing the sample mean, what would the histogram of those means look like?
This is different from the population distribution (which describes individual observations) and from the sample distribution (the data in one specific sample). The sampling distribution lives at a higher level — it describes the behaviour of the statistic across many repetitions.
The sampling distribution is not the distribution of the data in your sample. It is the distribution of values you would get if you repeated your study many times. You never actually observe the sampling distribution directly — it is a theoretical construct that tells you how reliable your estimate is.
Population:
① Population DistributionFixed — theoretical
All possible individual values. This is what you're sampling from.
μ = 50, σ = 15
② Your Sample (n = 30)Changes each draw
One specific dataset you collected. Not the population — just a snapshot of it.
x̄ = —
③ Sampling Distribution of x̄Fixed — theoretical
Not your data. The theoretical distribution of all possible x̄ values — if the study were repeated infinitely. You never observe this directly.
Theoretical: x̄ ~ N(50, SE²)
Figure: Three distinct objects. Click ↺ Draw New Sample repeatedly — Panel ② changes every time while Panels ① and ③ remain fixed. The sampling distribution (Panel ③) is a theoretical construct, not data. Notice how much narrower it is than the population.
Use the visualization above to see all three objects side by side. Click ↺ Draw New Sample repeatedly — Panel ② changes every time while Panels ① and ③ stay fixed. Try switching to a Right-Skewed population and notice Panel ③ is still a smooth, narrow bell — that is the CLT at work.
C2 — Mean and Standard Error
The sampling distribution has two key parameters. The first is its center — where does the distribution of sample means land on average?
Mean of the Sampling Distribution
The mean of the sampling distribution equals the population mean . This says that is an unbiased estimator of : on average, sample means hit the target.
This makes intuitive sense — if you average many sample averages, you should get close to the true population average. The remarkable thing is that this is exactly true, not approximately.
The second parameter is the spread — how much do sample means vary around ?
Standard Error of the Mean
The standard error (SE) of the mean measures the typical distance between a sample mean and the population mean . It is not the same as the population standard deviation — it shrinks as sample size increases.
Notice the in the denominator. A sample of has a standard error one-tenth that of a sample of . Larger samples give sample means that cluster more tightly around .
The most common error in this lesson: Using (population SD) instead of (standard error) when computing probabilities for sample means. The formula applies to a single observation. For a sample mean, you must use . Using where belongs makes your z-score times too small — a significant error.
Individual observation — spread = σ = 15
Sample mean x̄ — spread = SE = σ/√n = 15.00
At n = 1, SE = σ — both curves are identical.
SE = 15/√1 = 15.00
Figure: The blue curve (σ = 15) never changes — it describes individual observations. The orange curve narrows as n increases, showing how sample means cluster more tightly. Drag the slider to n = 100 to see the full contrast. To halve the SE, you must quadruple n.
Drag the slider to n = 9, 25, 100 and watch the orange curve (sample means) compress while the blue curve (individuals) stays fixed. At n = 100, a sample mean that deviates from μ by 5 units would be extremely unusual — but for an individual, a 5-unit deviation is ordinary.
C3 — The Central Limit Theorem
We’ve established that has mean and standard error . But what shape does its distribution have? This is where the Central Limit Theorem comes in — and it’s one of the most powerful results in all of statistics.
The Central Limit Theorem (CLT)
Let be a random sample from a population with mean and finite standard deviation . Then, for sufficiently large , the sampling distribution of the sample mean is approximately normal:
Three conditions must hold:
Independence: Observations must be independent of one another — satisfied by random sampling from a large population, or sampling with replacement.
Sample size: for non-normal populations (rule of thumb). If the population is already approximately normal, any produces an exactly normal sampling distribution.
Finite variance: The population must have a finite standard deviation .
What makes the CLT remarkable is what it does not require: the population does not need to be normally distributed. The population could be skewed, bimodal, uniform — anything. As long as the sample is large enough ( as a rule of thumb), the distribution of converges to normal.
This is why the normal distribution appears everywhere in statistics: even when individual observations are non-normal, averages tend to be normal.
The CLT says the sampling distribution of is approximately normal — it says nothing about the distribution of individual observations becoming more normal as grows. The data in your one sample still has whatever shape the population has. It is only the distribution of the statistic (across many hypothetical samples) that becomes normal.
The rule guards against non-normality of the population — it does not rescue a violated independence condition. If your data are a time series, a cluster sample, or repeated measures on the same individuals, the CLT may not apply regardless of how large is. Independence must be verified separately, and no amount of additional data can substitute for it.
Figure 1: Sampling Distribution Builder — draw samples from the population (left) and watch the distribution of sample means grow on the right. Notice how the histogram becomes bell-shaped as more samples are added, regardless of the population shape.
Use the visualization above to build intuition: select a right-skewed population and . The histogram of sample means is also skewed. Now increase to and draw 100 samples — the histogram takes on a bell shape even though the population is still skewed. That’s the CLT in action.
See it happen. Use the simulation below to watch the sampling distribution of form. Switch from a right-skewed population to a normal one — and notice that the sampling distribution looks normal in both cases when is large. Try reducing to 2 and increasing it to 100 to see how the spread changes.
Population shape:
Population Distribution
μ = —, σ = —
Sampling Distribution of x̄(0 samples drawn)
x̄ avg = —, SE = σ/√n = —
Figure: CLT Simulation — draw samples from a non-normal population and watch the sampling distribution of x̄ converge to a bell curve. The dashed orange curve is the theoretical normal approximation (appears after 30 samples). Adjust n to see how standard error changes.
C4 — Effect of Sample Size on the Sampling Distribution
Two things happen as you increase sample size :
The standard error shrinks: decreases. Sample means cluster more tightly around .
The normal approximation improves: Even for very non-normal populations, larger samples produce a more bell-shaped distribution of .
A concrete example: suppose .
Sample Size
Standard Error
Interpretation
No averaging benefit — same spread as population
SE is one-third of σ
SE is one-sixth of σ
SE is one-twelfth of σ
SE = 12.00
Figure: SE = σ/√n as a function of sample size. Drag the slider to see how SE falls quickly at first, then levels off — doubling n only halves SE if you quadruple it.
Notice the diminishing returns: doubling only reduces SE by a factor of . To halve the SE, you need to quadruple . This is why there’s a cost-benefit tradeoff in sampling design.
C5 — Computing Probabilities for
Once the CLT tells us the sampling distribution is approximately normal, we can use the z-table to compute probabilities. The only adjustment from PR-6 is the denominator in the z-score formula.
Z-Score for a Sample Mean
This converts a sample mean into a standard normal z-score. The denominator is the standard error, not .
Once you have , use the standard normal table exactly as in PR-6: is the left-tail area.
The workflow for any probability question about :
Compute the standard error:
Compute the z-score:
Look up the probability in the z-table.
If the question asks for a right-tail probability, use the complement: .
Step 1
SE = σ / √n
—
Step 2
z = (x̄ − μ) / SE
—
Step 3 — Φ(z)
Step 4 — Answers
P(x̄ < x̄₀)—
P(x̄ > x̄₀)—
Two-tail—
Figure: A live four-step pipeline. Edit any input field to see all four steps update instantly. Default values are from Example 2. The shaded area in Step 3 represents Φ(z) — the left-tail probability.
Retrieval Bridge — Individual Z-Scores
You now know the full workflow for sample means. Before moving to worked examples, confirm you can still apply z-scores to individual observations — the contrast between the two cases is central to Section 4. A standard normal z-table shows . A student scores 82 on an exam where and . What is the probability that a randomly chosen student scores lower than this student?
Section 4: Worked Examples
▾
Example 1 — Parameters of the Sampling Distribution (Fully Worked)
A population has mean and standard deviation . We draw random samples of size .
(a) What are the mean and standard error of the sampling distribution of ? (b) Is it reasonable to use the normal distribution for probability calculations? Why?
Part (a):
So the sampling distribution of has mean 80 and standard error 3.
Part (b): We need to check the CLT conditions. We’re not told the population shape, so we apply the rule of thumb: is ? Here .
This means the CLT rule of thumb is not met on its own. However, if the population is approximately normal (a common assumption in many real contexts), the sampling distribution is exactly normal for any . In that case, we can proceed. If the population is strongly skewed, we should be cautious with .
Key takeaway: Always check CLT conditions before computing probabilities. State your assumption about the population shape if .
Example 2 — Computing a Probability for (Partially Scaffolded)
Cereal boxes are filled by a machine with g and g. The fill weights are normally distributed. A quality inspector randomly selects boxes.
Find — the probability the sample mean is below 498 g.
Before seeing the solution: which formula for the denominator — or ? Why?
Step 1 — Standard error:
Step 2 — Z-score:
Step 3 — Probability:
Interpretation: If the machine is working correctly (), there is only about a 2.3% chance of observing a sample mean below 498 g. If you actually observe or below, that would be strong evidence the machine is underfilling.
Example 3 — Effect of Sample Size on Probability (Minimally Scaffolded)
Using the same cereal machine ( g, g), compute for samples of size and . Compare the three results (include from Example 2).
Hint: the process is identical to Example 2 — just compute a new SE for each n, then find the new z-score.
Show Solution
For :
For (from Example 2):
For :
Summary:
SE
16
2.00
−1.00
0.1587
64
1.00
−2.00
0.0228
100
0.80
−2.50
0.0062
As increases, the SE shrinks, the z-score magnitude grows, and the probability of being far from drops sharply. Larger samples make it much easier to detect a real deviation from target.
Example 4 — Individual Observation vs. Sample Mean (Application Twist)
This problem looks like the previous ones, but there’s a twist. Read carefully.
A student exam score in a large course is normally distributed with and .
(a) What is the probability that a single randomly selected student scores above 78? (b) What is the probability that the average score of a randomly selected class of 25 students is above 78?
Show Solution
Part (a) — single observation: Use (not SE).
Part (b) — sample mean of 25: Use SE.
Contrast: A single student scoring above 78 is moderately likely (27%). But a class average above 78 is extremely unlikely (0.13%) — because the average of 25 scores is far less variable than any individual score. This is the power (and the purpose) of the SE formula.
Individual score X
P(X > 78) = —
Sample mean x̄ of n scores
P(x̄ > 78) = —
Figure: Same cutoff (78), same μ = 72, σ = 10. The blue curve (individual) never changes — P(X > 78) ≈ 27%. The orange curve (sample mean) narrows as n grows, making P(x̄ > 78) collapse toward zero. Drag to n = 25 to see the contrast from Example 4.
Section 5: Guided Practice
▾
Problem 1 — Denominator Selection and Parameter vs. Statistic (C2, C5)
Manufacturing. A machine produces metal rods with length mm and mm (normally distributed). A sample of rods is measured. Find .
Step 1: Which denominator do you use to standardize ?
Step 2: The z-score is:
Step 3:
Health. Resting heart rates in adults have bpm and bpm (approximately normal). A clinic samples patients. Find .
Step 1: Standard error?
Step 2: Z-score?
Step 3:
Agriculture. A crop yield has bu/acre and bu/acre. A researcher samples plots. Find .
Step 1: Standard error?
Step 2: Z-score?
Step 3:
Finance. Monthly returns on a fund have and (approximately normal). An analyst reviews months. Find .
Step 1: Standard error?
Step 2: Z-score?
Step 3:
Education. Test scores on a standardized exam have and (normal distribution). A researcher samples students. Find .
Step 1: Standard error?
Step 2: Z-score?
Step 3:
Problem 2 — Computation of Standard Error and Z-Scores (C2, C5)
A population has and . Samples of size are drawn. Find .
What is the standard error?
What is the z-score for ?
, , . Find .
Standard error?
Z-score for ?
, , (population is normal). Find .
Standard error?
Z-score for ?
, , . Find .
Standard error?
Z-score for ?
, , (population is normal). Find .
Standard error?
Z-score?
Problem 3 — Central Limit Theorem (CLT) Compliance Conditions (C3)
For each scenario below, decide whether the CLT justifies using a normal distribution for the sampling distribution of .
Scenario A: The population of annual incomes in a city is strongly right-skewed (a few very high earners). You draw a sample of .
Scenario B: Test scores in a class of 15 students follow a roughly normal distribution. You record all 15 scores.
Scenario C: A population of daily wait times has an extreme bimodal shape (clustered near 2 min or 20 min, nothing in between). Sample size is .
Problem 4 — Comparative Analysis of Sample Size Effects (C4, C5)
A population has and . Compare samples of and .
Which sample has the smaller standard error?
Using , find .
, . Compare vs. .
Which has smaller SE?
Using , find .
, . Compare vs. .
Which has smaller SE?
Using , find .
, . Compare vs. .
Which has smaller SE?
Using , find .
, . Compare vs. .
Which has smaller SE?
Using , find .
Section 6: Independent Practice
▾
Problem 1 — When Does the CLT Apply? (C1 + C3)
A population of household electricity bills is heavily right-skewed — most bills are modest, but a few commercial accounts drive a long right tail. You plan to sample households.
Does the CLT apply? What will the sampling distribution of look like?
Show Solution
Yes — CLT applies. Even though the population is strongly right-skewed, meets the rule of thumb. The sampling distribution of will be approximately normal with mean and standard error . Individual bills are skewed; the average of 35 bills is approximately bell-shaped.
Heights of adult women in a country are approximately normally distributed. A researcher samples women.
Does the CLT apply? What shape will the sampling distribution have?
Show Solution
Yes — exactly normal, not just approximately. Because the population itself is normal, the sampling distribution of is exactly normal for any sample size — no approximation needed. The rule is unnecessary here.
The number of cars passing a toll booth per minute follows a Poisson distribution (right-skewed, especially for low traffic periods). Sample size: .
Is the normal approximation for reliable?
Show Solution
Caution required. The Poisson is right-skewed, and . The rule of thumb suggests the normal approximation may not be reliable. For very low mean counts (highly skewed Poisson), even can be marginal. In practice, you’d want a larger sample or use a different approach.
Customer satisfaction scores are bimodal: customers tend to rate either very low (1–2) or very high (8–10). Sample size: .
Does the CLT justify using the normal for ?
Show Solution
Yes, with confidence. Despite the bimodal population, comfortably meets the CLT rule of thumb. The sampling distribution of will be approximately normal. Bimodal populations need a somewhat larger than mildly skewed ones, but 50 is generally sufficient.
Retirement savings balances have an extremely right-skewed distribution (most people have little saved; a small number have millions). Sample: .
Should you apply the CLT here?
Show Solution
Proceed with caution. and the distribution is extremely skewed. The normal approximation for may be poor. In practice, statisticians often use larger samples (n = 50–100+) for heavily skewed financial data, or use non-parametric methods.
Problem 2 — Sampling Distribution Probability (C2 + C5)
Problem 3 — Comparing Sample Sizes (C2 + C4 + C5)
Problem 4 — Probability Across Population Shapes (C3 + C5)
Uniform population. A population is uniformly distributed on [20, 80], so and . Random samples of are taken.
The familiar 68% rule from the empirical rule — the sampling distribution is so tight (SE = 1) that 68% of sample means fall within 1 of the target.
Normal population. cm, cm, (population is normal). Find .
Show Solution
Because the population is normal, the sampling distribution is exactly normal for any n. SE = 6/√9 = 2. z = (167−165)/2 = 1.00.
Problem 5 — Individual Observation vs. Sample Mean (C1 + C5)
Blood pressure readings at a clinic are normally distributed with mmHg and mmHg.
Part (a): A single patient is selected. Find the probability that their reading exceeds 130 mmHg.
Part (b): A sample of patients is selected. Find the probability that the sample mean exceeds 130 mmHg.
Part (c): Explain the large difference between the two probabilities. What does it mean for how we interpret “is this reading unusual?”
Show Solution
Part (a) — single observation (use σ):
About 25% of individuals have a reading above 130. This is not unusual.
Part (b) — sample mean of 36 (use SE):
Essentially impossible for a correctly functioning system.
Part (c): A single high reading could reflect one unusual patient — there’s natural variation among individuals. But if the average of 36 patients is unusually high, the individual-level variation has been averaged out. Any remaining deviation from is likely a real systematic effect. The SE formula captures this: averages are much less variable than individual observations.
Mixed Review — Retrieval from Earlier Lessons
These problems draw on concepts from earlier in the course. Attempting them without re-reading prior lessons is the point — retrieval practice strengthens long-term memory more than re-reading.
Review Problem 1 — Variability and Spread (DS-4)
A sample of five delivery times (in minutes) is: 22, 25, 25, 28, 30. The sample mean is min.
(a) Compute the sample standard deviation . (b) If these five values were the entire population, would you use or in the denominator? Why?
Show Solution
(a) Deviations from : . Squared deviations: . Sum .
(b) For a population, the denominator is (not ), giving , so min. The denominator (Bessel’s correction) applies only to a sample, where it makes an unbiased estimator of . When you already have the full population, no correction is needed.
Review Problem 2 — Normal Distribution Probability (PR-6)
The lifespans of a certain brand of AA battery are approximately normally distributed with hours and hours. Using a z-table where and :
What is the probability that a randomly selected battery lasts between 17 and 22 hours?
Show Solution
Standardize both endpoints:
About 50% of batteries last between 17 and 22 hours. This makes intuitive sense: the interval extends 0.4 SDs below the mean and 1.6 SDs above it, capturing roughly half the distribution.
Section 7: Mastery Check
▾
No hints, no dropdowns — these questions measure genuine understanding.
Question 1 — Feynman Test
A classmate who missed this lesson asks you: “What is the Central Limit Theorem and why does it matter? What conditions does it need?”
Write your explanation below as if you were actually talking to them (complete sentences, plain language). Address all three prompts in your response:
What is the sampling distribution of ? (1–3 sentences)
What does the CLT guarantee, and what three conditions must hold? (2–4 sentences)
Why does this matter for real statistical work? (1–2 sentences)
0 words
See a model answer
The CLT says that if you take many random samples from any population (with finite mean and variance) and calculate the sample mean each time, those means will form an approximately normal distribution — regardless of what the original population looks like.
Three conditions must hold: (1) observations are independent of one another (satisfied by random sampling); (2) the sample size is large enough ( for non-normal populations, any if the population is already normal); (3) the population has a finite standard deviation. When all three hold, is approximately .
This matters because it lets us compute probabilities for sample means using the normal table — even when we have no idea what shape the original data follows.
Question 2 — Sampling Probability
A population has and . A sample of is drawn from a right-skewed population.
Step 1: Can you use the normal distribution for ?
Step 2: The standard error is:
Step 3: Find .
Show full solution
CLT applies (n = 100 ≥ 30). SE = 60/√100 = 6.
Question 3 — Error Analysis
A student presents this work for the problem “Find given , , ”:
The student got an answer — but there is a conceptual error. What is it?
Show Analysis
The error is in the denominator of the z-score. The student used instead of SE = . The z-score for a sample mean must use the standard error, not the population standard deviation.
Correct solution:
The student’s answer of 0.4207 is the probability for an individual observation, not a sample mean. Sample means are much less variable than individual values — that’s the whole point of the SE.
Self-Assessment
How confident are you with the concepts from this lesson?
Still confusedReady for the Boss Fight
Section 8: Boss Fight
▾
You’ve practiced individual concepts. Now it’s time to use all of them at once — under pressure. Choose your path:
🔬 The Analyst
You are given real data. Your job: compute the sample mean, find the standard error, calculate a probability, and interpret your result in context.
🏗️ The Architect
You don’t have data yet. Your job: design the sampling procedure — choose sample sizes, compare precision, and justify your decisions statistically.
🔬 Path A: The Analyst
A bottling plant fills juice cartons with a target volume of 250 mL. The fill volume is normally distributed with mL (known from long-run process data). A quality engineer pulls a random sample of cartons and records their volumes:
Sample Volumes (mL)
248.2
251.1
247.5
249.8
250.3
248.9
246.7
252.1
249.4
251.6
247.8
250.5
249.1
248.6
251.9
250.0
247.3
252.8
249.7
248.4
251.2
250.6
247.9
249.5
248.1
Work through all four steps in the space below, then check the solution.
Task A1: Compute the sample mean from the data above.
Task A2: Compute the standard error for samples of when .
Task A3: Find — the probability of getting a sample mean at most as low as what you observed, assuming the machine is filling correctly at .
Task A4: Based on your answer, does this sample provide evidence that the machine is underfilling? Justify your reasoning.
Show Full Solution
1. Sample mean: Sum all 25 values = 6,240.0 mL. mL.
2. Standard error: mL.
3. Probability:
4. Interpretation: A sample mean of 249.60 mL would occur about 34% of the time by chance when the machine is working correctly. This is not unusual — the machine is likely fine. A cause for alarm would be a probability below 5% (z ≤ −1.645).
Reflection: What would you conclude if the sample mean had been 246 mL instead? How does the SE affect your interpretation?
🏗️ Path B: The Architect
You are a quality engineer at a pharmaceutical company. Tablets must contain exactly mg of active ingredient. From historical data, you know the process has mg. You want to design a sampling procedure to monitor the production line.
Task B1: If you use a sample of tablets, what is the standard error? What is the probability of observing when the process is on target?
Task B2: Suppose you want to reduce that probability to below 1% (making it very easy to detect a 10 mg underfill). What sample size do you need? (Find the SE such that , then solve for .)
Task B3: Your colleague suggests flagging individual tablets below 490 mg instead of using averages. Compare for a single tablet vs. your sample mean probability from B1. Which detection method is more sensitive?
Task B4: What are the cost implications of your answer to B2? Is it always worth increasing ?
Show Full Solution
1.
2. Need z ≤ −2.33: Then Round up: n = 22 tablets.
3. Single tablet: About 31% of individual tablets fall below 490 mg even when the process is on target — monitoring individual tablets produces constant false alarms. The sample mean approach (2.3% false alarm rate for n = 16) is far more precise.
4. Larger n reduces SE and increases detection sensitivity, but each additional tablet costs time and material. The cost of a false alarm (stopping the line) and the cost of a miss (shipping under-dosed tablets) must be weighed. There is no universally “correct” n.
Reflection: What would you do differently if σ were unknown? How does uncertainty in σ affect your sample size calculation?
Section 9: Challenge Problems
▾
Optional stretch. These go beyond the lesson objectives. Attempt them when you feel confident with the core material.
Challenge 1 — Finding Minimum Sample Size
A population has . You want the standard error of the mean to be at most 3. What is the minimum sample size required? Also verify whether the CLT applies.
Show Solution
We need . Solving: , so .
Minimum: . Since 64 ≥ 30, CLT conditions are met.
Challenge 2 — The Finite Population Correction (FPC)
The standard error formula assumes the population is infinitely large (or that sampling is done with replacement). When you sample a significant fraction of a small population, the SE is actually smaller than this formula predicts. The correction is:
where is the population size and is the sample size. The second factor is the finite population correction (FPC). It is close to 1 (essentially no correction needed) when (sampling less than 5% of the population).
Application: A company has employees. Their ages have years. You survey employees.
(a) Is the FPC needed? (Check whether .)
(b) Compute the SE without and with the FPC. How much difference does it make?
Show Solution
(a). Yes, the FPC is needed.
(b) Without FPC:
With FPC:
The corrected SE is about 0.98 vs. 1.13 — a 13% reduction. For this problem, the FPC matters because we’re sampling 25% of the population. The more of the population you survey, the less residual uncertainty remains.
Challenge 3 — Why Averaging Reduces Variability: A Proof Sketch
The formula follows from basic properties of variance. Here’s the argument — answer the two questions below.
Let be independent draws from a population with variance . Then:
Step 1:. Which property lets you bring the outside the variance?
Step 2: After extracting , we have . What allows us to write ?
See the complete derivation
Because the are independent:
Taking the square root: .
This is the exact formula for the standard error — derived from first principles, not memorized. The key ingredients are (1) the variance scaling rule and (2) independence of the observations.
Section 10: Solutions Reference
▾
Complete, step-by-step solutions for all problems in Sections 5–9 are available on the solutions page. Solutions include worked arithmetic, common mistakes to watch for, and interpretation guidance.
If you’re stuck: Re-read the relevant Core Concept in Section 3, then find the Worked Example that maps to that concept (e.g., Example 1 maps to Concept 1). The solutions page shows the reasoning behind every step, not just the final answer.