A social media analytics company builds a model to predict daily user engagement from post frequency. After collecting data from 5,000 users, they run the regression and receive good news: the relationship is statistically significant (). But then they look at one more number: . The model explains only 4% of the variance in engagement.
Should the company base its strategy on this model?
Statistical significance tells you the relationship is real — not just noise. But it says nothing about whether the model is useful. A massive dataset can make even a trivially weak relationship significant. Before trusting any regression model for prediction, you need to ask: How well does it actually explain the data? Is my prediction inside the range where the model was fit? And are the model’s assumptions even satisfied?
This lesson gives you the tools to answer those questions rigorously.
After this lesson, you will be able to:
By the end of this lesson, you will be able to:
Interpret the slope and intercept of a regression equation precisely, using the three required phrases.
Distinguish interpolation from extrapolation and explain why extrapolation is risky.
Read a residual plot to diagnose non-linearity, heteroscedasticity, outliers, and influential points.
Perform a five-step significance test for using the -distribution with .
Distinguish statistical significance from practical significance using .
Section 2: Prerequisites
▾
What you need coming in — and why it matters today:
Regression equation (REG-2): You know how to compute and . Today you go deeper — interpreting what those numbers mean in context and knowing when predictions from that equation can be trusted.
Residuals (REG-2): You computed individual residuals in REG-2. Today you will read entire residual plots — patterns in the residuals reveal whether the model’s assumptions hold.
Conditions for regression (REG-2): Linearity, independence, equal variance, and near-normality of residuals. Today’s residual plot diagnostics are the practical tool for checking linearity and equal variance.
Five-step hypothesis test framework (INF-5):, , test statistic, -value, decision and conclusion. Today’s test for follows exactly this structure — the only new element is the -distribution and a different test statistic formula.
Decision rule (INF-5): Reject if ; fail to reject if . “Fail to reject accept.” This rule applies identically to the correlation test.
Quick check — can you recall these?
Which of the following is the correct interpretation of the slope in ?
Success Factor:
What changes in this lesson: In REG-2, you built the regression equation and computed residuals. Here you ask: Can this equation be trusted? That means reading residual patterns visually, classifying predictions as safe or risky, and formally testing whether the linear relationship is real in the population. The five-step framework from INF-5 carries over exactly — only the test statistic formula and distribution change.
Retrieval Warm-up — from earlier lessons
An environmental scientist fits a regression line to data on river flow rate (, m³/s) and suspended sediment concentration (, mg/L) for 18 measurement stations. She gets , with and . She wants to verify her arithmetic before proceeding. Which check should she perform?
A researcher states: “I ran a hypothesis test and got with . I conclude that the null hypothesis is true.” Which error in reasoning is present?
Section 3: Core Concepts
▾
How this section is organized: Ten concepts build the complete toolkit for evaluating and using a regression model.
C1–C2: Interpreting the slope and intercept precisely (what the numbers mean)
C3–C4: Safe vs. risky prediction — interpolation and extrapolation
C5–C6: Residual plot diagnostics — checking model assumptions visually
C7–C8: Outliers and influential points — when one observation changes everything
C9–C10: The significance test for and the statistical-vs.-practical distinction
C1 — Slope Interpretation (Precision)
The slope in is more than a number — it is a statement about how two variables are related, on average, in the population the data represent.
A complete slope interpretation requires three specific phrases. Each is non-optional.
Slope Interpretation — Required Form
”For each 1-unit increase in , the predicted changes by units, on average.”
If : replace “changes by” with “increases by.” If : “decreases by units.”
Always name the units of both and in your sentence.
Mini-example:, where = study hours and = exam score (out of 100).
Correct: “For each additional hour of study, the predicted exam score increases by 3.70 points, on average.”
Three phrases present: ✓ “additional hour of study” (1-unit increase in ) ✓ “predicted exam score” (predicted ) ✓ “on average”
Three traps in slope interpretation:
(1) Missing “on average”: The line predicts the mean response for all students with a given number of hours — not what any specific student will score.
(2) Causation language: “Studying 1 more hour causes the score to increase by 3.70 points” — regression shows association, not causation. Use “predicted” or “is associated with,” not “causes.”
(3) x and y reversed: “For each 3.70-point increase in score, hours increase by 1” — always describe responding to , never the reverse.
C2 — Intercept Interpretation
The intercept is the predicted value of when . Whether that is meaningful depends on whether makes sense in context.
Intercept Interpretation
The intercept gives when . It is contextually meaningful only if falls within or very near the observed data range.
If is far outside the observed range, the intercept is a mathematical anchor that keeps the line positioned correctly — it is not a reliable real-world prediction.
Mini-example:, where = fertilizer (g) and = tomato yield (kg), observed range g.
is within the range → meaningful: “The predicted tomato yield with no fertilizer is 1.60 kg.”
Contrast:, age (years) → reaction time (ms), observed range . represents a newborn — far outside the data. The intercept 177.4 ms is not a meaningful prediction for a newborn.
Do not interpret the intercept as a real-world prediction just because it has a plausible numerical value. The check is whether is inside (or very near) the range of the data that was used to fit the model. If not, the intercept is just a placement parameter.
C3 — Interpolation
Interpolation means predicting for an value inside the observed data range .
Interpolation
A prediction is an interpolation if .
Interpolation is generally reliable: the model was fit to data in this region, so the linear pattern has been empirically verified there.
Mini-example: Model fit on study hours . Predicting for hours: → interpolation. The model can be trusted here.
C4 — Extrapolation
Extrapolation means predicting for an value outside the observed range.
Extrapolation
A prediction is an extrapolation if or .
Extrapolation is risky: the linear relationship observed within may not extend beyond it. Predictions can be implausible or physically impossible.
Mini-example: Same model ( hours). Predicting for hours: → extrapolation. The model assumes linearity continues indefinitely, but real exam scores are capped at 100 — the linear trend cannot hold.
Predicting for any is mathematically possible — the arithmetic always works. The danger is in interpreting the result as a reliable estimate. Always check whether is inside or outside before using a prediction. This is the single most important habit in applied regression.
C5 — Residual Plots: Linearity Check
A residual plot graphs the residuals against the fitted values (or against ). It makes systematic patterns visible.
Residual Plot — Linearity Check
Plot residuals on the vertical axis vs. fitted values on the horizontal axis.
Good sign: Residuals bounce randomly above and below with no systematic curve → linearity assumption holds.
Bad sign: A curved band (e.g., U-shape, arch) → the relationship is non-linear; a straight-line model is inappropriate.
Key diagnostic: A U-shaped curve in the residual plot means positive residuals at small and large , with negative residuals in the middle. The model consistently underestimates at the extremes. The fix: transform the data or fit a non-linear model.
C6 — Residual Plots: Homoscedasticity Check
Even if the residuals are random (no curve), their spread should stay constant across all fitted values. Non-constant spread is called heteroscedasticity.
Residual Plot — Homoscedasticity Check
Homoscedasticity (good): The vertical spread of residuals looks similar for small, medium, and large values.
Heteroscedasticity (bad): A fan shape — residuals tightly clustered for small , widely spread for large (or vice versa). This means the model’s precision varies across the range of predictions, which invalidates standard errors.
A fan-shaped residual plot is not just “a few outliers.” It indicates that the entire variance structure of the model is wrong. Standard errors, confidence intervals, and p-values from a heteroscedastic model are not reliable.
Residual Plot Explorer: A residual plot is the primary diagnostic tool for regression. The left panel shows the original scatter plot with the regression line; the right panel shows the corresponding residual plot, revealing what the model is missing.
Scatter Plot
Residual Plot
Residuals bounce randomly around e = 0 with consistent spread. Both the linearity and homoscedasticity assumptions appear satisfied.
C7 — Outliers in Regression
A regression outlier is a point with a large residual — it falls far from the regression line in the -direction.
Regression Outlier
An observation is a regression outlier if is unusually large compared to the typical residual size.
Outliers inflate and can pull the regression line toward them, distorting the slope and intercept.
Mini-example: If every residual is between and but one observation has , that point is a clear regression outlier. It single-handedly increases by .
C8 — Influential Points and Leverage
An influential point is one that, if removed, would substantially change the slope or intercept. Points with extreme -values have high leverage — they can be influential even without a large residual.
Influential Points and Leverage
An influential point is one whose removal would substantially change or .
A point has high leverage when its -value is far from . High leverage points control the slope — the line is “anchored” to them.
Key distinction: A high-leverage point that happens to fall exactly on the regression line has a residual of 0 but still controls the slope. Leverage outlier.
Do not confuse outliers and influential points. An outlier has a large residual (far from the line vertically). An influential point changes the slope if removed. A point can be: (a) an outlier only; (b) influential only (high leverage, small residual); or (c) both — the most dangerous case.
C9 — Significance Test for (Five-Step)
We can test whether the population correlation (rho) is zero — i.e., whether there is a real linear relationship in the population, or whether the observed could be due to chance alone.
Significance Test for the Population Correlation (ρ = 0)
Step 1 — Hypotheses:
(no linear relationship in the population)
(two-tailed), or / (one-tailed)
Step 2 — Check conditions: are approximately bivariate normal; observations are independent.
Step 3 — Test statistic:
Step 4 — p-value: Use the -distribution with . For two-tailed: .
Step 5 — Decision and conclusion: Reject if . State the conclusion in context.
Why the -distribution? In INF-5, we used because we assumed was known. Here we are estimating the population correlation from the sample — there is additional uncertainty. The -distribution with accounts for this. This is the same principle as INF-6’s -test for a mean with unknown .
Mini table (, two-tailed):
df
6
8
10
13
18
23
28
2.447
2.306
2.228
2.160
2.101
2.069
2.048
C10 — Statistical vs. Practical Significance
A statistically significant result () means we have evidence that in the population. It does not mean the model is useful for prediction.
Statistical vs. Practical Significance
Statistical significance (): The sample provides sufficient evidence that the linear relationship is non-zero in the population.
Practical significance: The model explains enough variance to be useful for prediction. This is measured by — the proportion of variability in explained by .
A large can make even a very weak relationship statistically significant. Always report alongside the -value.
The key question: After finding , ask: “What is ?” If , the model explains only 4% of the variance in . The remaining 96% is unexplained — 96 out of every 100 “prediction errors” come from factors the model does not capture. That model is not useful for prediction, even though the relationship is statistically real.
Two traps with practical significance:
(1) “Significant p-value means good model” — With , even produces a significant -value, yet explains essentially nothing.
(2) “r² = 0.85 means predictions are 85% accurate” — measures the proportion of variance explained, not prediction accuracy for individual observations.
Section 4: Worked Examples
▾
Example 1 — Fully Worked: Interpreting Slope and Intercept
Context: A researcher fits a regression of exam score () on study hours () using data from 30 students. The observed range is hours. The regression equation is .
Interpret the slope and intercept.
Full solution with reasoning:
Slope: I notice and = study hours, = exam score.
I need three phrases: “1-unit increase in ” → “additional hour of study”; “predicted ” → “predicted exam score”; and “on average.”
Interpretation: “For each additional hour of study, the predicted exam score increases by 3.70 points, on average.”
Intercept:; means zero hours of study. The observed range starts at , so is just at the lower boundary.
I check: is inside or very near ? It is at the edge — not far outside, but also not clearly within the range where data were collected.
Interpretation: “The intercept 56.90 is borderline: is just below the observed range of 1–8 hours. If x = 0 is contextually plausible (a student who did not study), the intercept predicts a baseline score of 56.90 points, but this is at the edge of what the model can reliably support.”
Example 2 — Partial Scaffold: Testing
Context: A researcher collects data on pairs and finds . Test against at .
Critical value:.
Your turn: Before looking at the solution, try substituting , into the formula .
Predict first: Do you expect this result to be statistically significant? looks strong — but does the sample size matter?
Show Solution
Step 1: vs. , .
Step 2: Conditions assumed met (data are approximately bivariate normal, independent).
Step 3:
Step 4:, so .
Step 5: Reject . There is statistically significant evidence of a linear relationship between the two variables in the population.
Note on C10: — the model explains about 56% of the variance in . This is both statistically significant and moderately practically significant.
Example 3 — Prediction Checkpoint: Interpolation vs. Extrapolation
Context: A researcher uses the model (fertilizer in grams → tomato yield in kg), fit to data with g. Two predictions are requested: g and g.
Predict the risk level before computing: Which prediction do you expect to be reliable? Which do you expect to be risky? Why?
Show Solution
g: → Interpolation (reliable).
g: → Extrapolation (risky).
The prediction of 17.35 kg may be unreliable. The linear relationship observed up to 20 g of fertilizer may not continue to 35 g — at high fertilizer levels, yield often plateaus or decreases due to nutrient toxicity. The model has no data to support linearity in this region.
Example 4 — Find the Error
A researcher uses (age in years → reaction time in ms), fit to data from adults aged 20–65 years (, , ).
The researcher reports:
Researcher’s analysis:
“The regression proves that aging causes slower reactions, confirming the biological mechanism."
"Since the p-value is 0.001, the model is statistically significant, so we can trust all predictions from it."
"A person aged 80 years will have a predicted reaction time of ms. This is a reliable clinical prediction.”
Identify all errors in the researcher’s analysis.
Show Solution
Error 1 — Causation language: Regression shows association, not causation. Saying the regression “proves aging causes slower reactions” is incorrect. The observed association could be due to confounders (e.g., health conditions correlated with age). Use “is associated with” or “predicts.”
Error 2 — Extrapolation misuse: years is outside the observed range years. This is extrapolation. The linear trend observed in adults 20–65 may not hold at age 80 — neurological and physical changes at extreme ages may create non-linearities. Reporting 312.6 ms as a “reliable clinical prediction” is incorrect.
Error 3 — Conflating statistical significance with prediction reliability: means the correlation is real in the population (not zero). It does not mean the model can be trusted for all predictions, especially extrapolated ones. Statistical significance applies to the data range used for fitting.
Note: — the model does explain 66% of variability within the observed range. But none of that applies to predictions at .
Section 5: Guided Practice
▾
Problem 1 — Slope and Intercept Interpretation
Context:, where = study hours and = exam score (out of 100). Observed range: hours.
(a) Select the correct slope interpretation:
(b) Is the intercept () contextually meaningful?
Context:, where = temperature (°C) and = hot beverage sales (units). Observed range: °C.
(a) Select the correct slope interpretation:
(b) Is the intercept () contextually meaningful?
Context:, where = daily exercise (minutes) and = resting heart rate (bpm). Observed range: min.
(a) Select the correct slope interpretation:
(b) Is the intercept () contextually meaningful?
Context:, where = fertilizer amount (g) and = tomato yield (kg). Observed range: g.
(a) Select the correct slope interpretation:
(b) Is the intercept () contextually meaningful?
Context:, where = age (years) and = reaction time (ms). Observed range: years.
Classify each as Interpolation or Extrapolation:(a) min:
(b) min:
(c) min:
(d) Compute for :
Context: (fertilizer grams → tomato yield kg). Observed range: g.
Classify each as Interpolation or Extrapolation:(a) g:
(b) g:
(c) g:
(d) Compute for :
Context: (age years → reaction time ms). Observed range: years.
Classify each as Interpolation or Extrapolation:(a) years:
(b) years:
(c) years:
(d) Compute for :
Problem 3 — Residual Plot Diagnosis
Three residual plots are described below. For each description, select the correct diagnosis.
(a) “The residuals bounce randomly above and below zero with no discernible pattern. The spread looks roughly the same for all fitted values.”
(b) “For small fitted values the residuals are positive; for middle fitted values they cluster near zero; for large fitted values they become positive again, forming a U-shape.”
(c) “For low fitted values the residuals are tightly clustered within ±2; for high fitted values the residuals range from −15 to +15.”
Problem 4 — Significance Test for (Generator)
Use the following critical values for (two-tailed): , , , , .
(a) Slope: “For each additional hour of study, the predicted exam score increases by 3.70 points, on average.” Three phrases: ✓ “additional hour of study” ✓ “predicted exam score” ✓ “on average.”
(b) x = 0 is just below x_min = 1. Borderline — contextually it could represent a student who did not study, but the model has essentially no data near this value.
(c) x = 11 > 8 = x_max → extrapolation. x = 5 ∈ [1,8] → interpolation. x = 8 is the upper boundary → at the edge of interpolation.
(d)
Context: (temperature °C → hot beverage sales). Observed range: °C.
(a) Select the correct slope interpretation:
(b) Is the intercept contextually meaningful?
(c) Which x value is extrapolation?
(d) Compute for °C:
Show Solution
(a) “For each additional degree Celsius, the predicted hot beverage sales decrease by 2.16 units, on average.”
(b) x = 0 is below x_min = 5 → outside the observed range → not meaningful.
(a) “For each additional minute of daily exercise, the predicted resting heart rate decreases by 0.53 bpm, on average.”
(b) x = 0 < 10 = x_min → outside the observed range → not meaningful.
(c) x = 75 > 60 = x_max → extrapolation.
(d)
Context: (fertilizer grams → tomato yield kg). Observed range: g.
(a) Select the correct slope interpretation:
(b) Is the intercept contextually meaningful?
(c) Which x value is extrapolation?
(d) Compute for g:
Show Solution
(a) “For each additional gram of fertilizer, the predicted tomato yield increases by 0.45 kg, on average.”
(b) x = 0 is within [0, 20] → fully meaningful. “The predicted yield with no fertilizer is 1.60 kg.”
(c) x = 25 > 20 = x_max → extrapolation.
(d)
Context: (age years → reaction time ms). Observed range: years.
(a) Select the correct slope interpretation:
(b) Is the intercept contextually meaningful?
(c) Which x value is extrapolation?
(d) Compute for years:
Show Solution
(a) “For each additional year of age, the predicted reaction time increases by 1.69 ms, on average.”
(b) x = 0 is 20 years below x_min = 20 → clearly outside the range → not meaningful.
(c) x = 70 > 65 = x_max → extrapolation.
(d)
Problem 2 — Significance Test and (Generator)
Use the following critical values for (two-tailed): , , , , .
Problem 3 — Find the Error
Scenario: A researcher fits to temperature vs. hot beverage sales data (observed range: 5–35°C). They predict sales at °C and report “Sales are predicted to be 15.2 units.” They do not flag any concern.
What is the error in this researcher’s statement?
Show Solution
Error — Extrapolation without warning:°C is 15 units beyond °C. The model was fit to data in [5, 35]°C; the linear relationship may flatten, curve, or reverse at extreme temperatures. The correct practice is to: (1) flag that this is extrapolation, and (2) note that the prediction may be unreliable.
— the arithmetic is correct, but presenting this without caveats is the error.
Scenario: A researcher uses (age 20–65 years → reaction time ms) and reports: “The intercept 177.4 means that a newborn has a predicted reaction time of 177.4 ms.”
What is the error in this researcher’s statement?
Show Solution
Error — Meaningless intercept interpretation: The model was fit to adults aged 20–65. represents a newborn — 20 years below the lower boundary. The intercept places the regression line at the correct height for adults but says nothing reliable about newborns. The intercept is a mathematical positioning constant, not a prediction for when is outside the data range.
Scenario: A researcher reports: “We found a statistically significant linear relationship between social media usage and productivity (, , , ). Since the p-value is significant, social media usage is a strong predictor of productivity.”
What is the error in this researcher’s statement?
Show Solution
Error — Conflating statistical and practical significance: With , even a trivially small produces a highly significant -value. But — only 3.24% of the variance in productivity is explained by social media usage. The remaining 96.76% comes from other factors. Calling “a strong predictor” because is a fundamental misinterpretation. Always report and interpret alongside the -value.
Scenario: A researcher notes that adding one data point at to a dataset with changes the slope from to . They report the regression with without mentioning the extreme point.
What is the error in this researcher’s statement?
Show Solution
Error — Ignoring an influential point: is 35 units beyond in the original dataset. This point has extremely high leverage — it is a “remote” -value that anchors the regression line. Its presence changes from 0.3 to 1.8, a 6-fold change. Best practice: (1) report the regression with and without the influential point, (2) investigate whether is a data error or a genuine observation, and (3) discuss its effect on the slope in any published analysis.
Scenario: A researcher fits to training hours vs. 5K run time (range: 5–30 hours) and says: “If we train an athlete for 70 hours a week, the model predicts their time will drop to −3.5 minutes.”
What is the error in this researcher’s statement?
Show Solution
Error — Extrapolation producing an impossible result: is 40 hours beyond . The arithmetic gives minutes — a negative race time, which is physically impossible. This vividly illustrates why extrapolation is dangerous: even when the formula works mechanically, the result can be nonsensical. The linear relationship observed at 5–30 hours/week does not extend to 70 hours — in reality, extreme training volumes lead to overtraining and performance decline (non-linearity).
Problem 4 — Prediction Risk (Generator)
Use the following critical values for (two-tailed): , , , , .
Problem 5 — Multi-Step Synthesis
Context: A sports science researcher studies the relationship between weekly training hours (, observed range 5–30 hours) and 5K run time (, minutes) for competitive runners.
Pre-computed summary statistics: , , , , , .
(a) Compute the slope and intercept .
(b) Interpret the slope in the context of this study.
(c) Is the intercept contextually meaningful? Explain.
(d) Classify these three predictions: hours, hours, hours.
(e) Compute for and . Note any concern with the second prediction.
(f) Perform a formal significance test for at , two-tailed.
Use .
(g) Compute and comment on practical significance.
(h) A coach wants to use the model for a runner training 38 hours/week. Would you recommend it?
Show Solution
(a) Computing b and a:
Regression equation:
(b) Slope interpretation: “For each additional hour of weekly training, the predicted 5K run time decreases by 0.50 minutes (30 seconds), on average.”
(c) Intercept meaningfulness: means no training — this is below the observed range of 5–30 hours. While it is intuitive that a non-runner would be slower, the model was not fit to data in this region. The intercept (31.5 min) is a mathematical anchor rather than a reliable prediction.
(d) Classification:
hours: → Interpolation ✓
hours: → Extrapolation ✗
hours: → Interpolation ✓
(e) Predictions:
Concern: is extrapolation. The linear trend may not hold at extreme volumes — overtraining effects could create a non-linear plateau or decline.
(f) Significance test:
vs. , .
→ → Reject .
Conclusion: There is statistically significant evidence of a linear relationship between weekly training hours and 5K run time.
(g) Practical significance:
Training hours explain approximately 85% of the variability in 5K times. This is both statistically significant and practically meaningful — the model accounts for most of the performance variability.
(h) Coach’s request ():
— extrapolation. The model should not be used confidently here. Moreover, extreme training volumes may violate the linearity assumption (overtraining non-linearity). At minimum, flag the extrapolation risk clearly and recommend collecting data on high-volume athletes before trusting the prediction.
Mixed Review — Retrieval from Earlier Lessons
These problems draw on concepts from earlier in the course. Attempting them without re-reading prior lessons is the point — retrieval practice strengthens long-term memory more than re-reading.
Review Problem 1 — Pearson Correlation (REG-1)
A psychologist studies the relationship between sleep duration (hours/night, ) and reaction time (milliseconds, ) in 22 adult participants. She reports .
(a) Interpret the sign and magnitude of in context.
(b) Compute and interpret it.
(c) A newspaper reports: “Getting more sleep makes you react faster.” Identify the statistical reasoning error and give a specific alternative explanation.
Show Solution
(a) indicates a moderate-to-strong negative linear association between sleep duration and reaction time. Participants who sleep more tend to have lower (faster) reaction times. The association is fairly strong — the points cluster reasonably close to a downward-sloping line, though not perfectly.
(b)
Sleep duration explains approximately 50.4% of the variability in reaction time across participants. The remaining 49.6% is accounted for by other factors (caffeine intake, age, stress, prior sleep debt, etc.).
(c) The error is inferring causation from correlation. The study is observational — no variable was randomly assigned. The association between sleep and reaction time is real, but it does not establish that sleep causes faster reactions.
One specific alternative explanation: reverse causation. Participants with naturally fast reaction times may be athletes who follow disciplined sleep schedules. Their athletic training improves reaction time, and their discipline leads to more sleep — both effects driven by being an athlete, not by sleep causing faster reactions.
Review Problem 2 — Building a Regression Line (REG-2)
An agricultural researcher studies the effect of nitrogen fertilizer dose (, kg/ha) on wheat yield (, tonnes/ha). From 12 experimental plots she computes:
(a) Compute the least-squares slope and intercept .
(b) Write the regression equation and use it to predict yield at kg/ha.
(c) Compute the residual for a plot where kg/ha and the actual yield was tonnes/ha. Interpret the sign of the residual.
Show Solution
(a)
Verification: ✓
(b)
At kg/ha:
(c) Residual: tonnes/ha
The positive residual means this plot yielded more than the regression line predicted. Its actual point lies above the regression line. The model underpredicted yield for this specific plot — perhaps this plot had unusually favorable soil conditions or microclimate.
Section 7: Mastery Check
▾
Question 1 — Feynman Test
Explain to a classmate who missed class why “statistically significant” does not mean “practically useful.” Use in your explanation, and include a concrete example with numbers.
0 / 500
Show Model Answer
“Statistical significance” () only tells you that the linear relationship is non-zero in the population — that the you observed is unlikely to be due to chance alone. It says nothing about the size of the relationship.
measures practical significance: the proportion of variance in explained by . A model with explains only 4% of the variability — the other 96% is random noise from the model’s perspective. With a large enough sample (e.g., ), even (so ) produces .
Concrete example: A company finds between post frequency and engagement, , . But — engagement is 97% unexplained. Reporting “significant predictor” without reporting is misleading. A model that explains 3% of variance is practically useless for strategy decisions, even if the relationship is real.
Question 2 — Apply
A nutrition researcher fits a regression of energy intake (, kcal/day) on sleep hours (, hours/night) using adults, observed range hours. Results: , , .
(a) Select the correct slope interpretation:
(b) Perform the significance test for at , two-tailed. , .
(c) Is the prediction kcal/day safe to use?
Question 3 — Error Analysis
A student’s claim:
“I fit a regression of weight (kg) on height (cm) for adults. The model is with , . Since is quite high, I can use this model to predict the exact weight of a specific individual with high confidence. For example, for a person 175 cm tall, the model predicts 56.0 kg — this is their likely weight.”
Identify the specific error in this student’s reasoning.
Show Solution
Error — Confusing with individual prediction accuracy:
means the regression explains 68% of the variance in weight across the sample — a population-level statistic. It does not mean any individual’s predicted weight will be within 32% of their true weight, nor does it give the margin of error for a specific person.
For an individual prediction, the appropriate measure is a prediction interval, which accounts for both:
Uncertainty in the mean response (how well we know the average weight for people 175 cm tall), and
Natural variability among individuals with the same height.
Prediction intervals are substantially wider than confidence intervals for the mean. With , there is still 32% unexplained variance — for real human weight, that translates to a wide range of individual outcomes. A 95% prediction interval for this person might span 15–20 kg above and below 56.0 kg.
The student is treating a population-level fit statistic as a guarantee of individual accuracy, which is a classic misinterpretation.
Self-Assessment
How confident do you feel about regression interpretation and prediction?
Still confusedReady for the Boss Fight
Section 8: Boss Fight
▾
You have learned ten concepts for evaluating and using regression models. Now you apply them in an integrative scenario. Choose a path — both draw on the full lesson, but from different angles.
🔬 Path A — The Diagnostician
You are a statistician reviewing a dataset. Your job: fit the model, test its significance, diagnose its assumptions, and identify any influential observations. Write a full model evaluation.
📊 Path B — The Predictor
You are a data analyst advising a coaching staff. Your job: evaluate a series of prediction requests, interpret the model, and correct a coach’s misunderstanding about what statistical significance means for real-world predictions.
🔬 Path A — The Diagnostician
A researcher is studying whether hours of screen time per day () predicts self-reported sleep quality score (, scale 0–100) in adults.
Pre-computed values:, , , ,
The researcher has also noted: one subject reports hours/day, which is well outside the range of the other 9 subjects ( hours).
Task 1. Compute and , then write the regression equation.
Task 2. Test at , two-tailed. Use .
Task 3. The researcher describes the residual plot: “All residuals are randomly scattered near zero with consistent spread — except for the subject with , whose residual is .” Diagnose the residual plot. What does the large residual at tell you?
Task 4. Write a one-paragraph model evaluation that addresses: (a) significance; (b) practical significance (); (c) the outlier/influential point at ; (d) whether the model can be trusted for predictions within the observed range of the other 9 subjects.
0 / 500
Reflection: What was the most challenging part of this analysis? Was it the initial setup or the final interpretation?
📊 Path B — The Predictor
You are advising a running club’s coaching staff. The club’s statistician has fit the following model on competitive runners:
where = weekly training hours (observed range: 5–30 hours) and = 5K run time (minutes).
Additional model statistics: , , .
The coaching staff has sent you four requests.
Task 1. Coach A asks: “Predict the 5K time for a runner averaging 22 hours/week.” Compute and classify this as interpolation or extrapolation.
Task 2. Coach B asks: “One of our elite athletes trains 40 hours/week. Predict their 5K time.” Compute and write a brief advisory note about whether this prediction can be trusted.
Task 3. Coach C asks you to “interpret what the slope means for training design.” Write the correct slope interpretation in context, and note any limitation of using the slope to design individual training plans.
Task 4. Coach D says: “Since the p-value is less than 0.001, the model is highly significant, so we can trust all of its predictions.” Write a correction to Coach D’s statement that explains: (a) what the p-value actually tells you; (b) why statistical significance does not guarantee reliable predictions outside the observed range; (c) what adds to the assessment.
0 / 500
Reflection: What was the most challenging part of this analysis? How would you apply this design approach to another problem?
Section 9: Challenge Problems
▾
Ready for more? These problems go beyond the lesson objectives.
Problem 1 — The Effect of an Influential Point
A dataset has points with and . The regression of all 8 points gives , .
A ninth point is added at .
(a) Compute using the original equation. Is the ninth point a regression outlier?
(b) After adding the ninth point, the slope changes to and , with . Has the ninth point shown high leverage and influence?
(c) Is within the observed range of the original 8 points? What does this tell you about the relationship between high leverage and extreme -values?
Show Solution
(a) Outlier check:
Residual:
A residual of is very large compared to the typical residuals for the other 8 points. Yes, the ninth point is a regression outlier.
(b) High leverage and influence:
Without the ninth point: , .
With the ninth point: , .
Removing the ninth point would change from 0.75 to 1.1 (a 47% change) and improve from 0.61 to 0.82. This is a substantial change — the ninth point is highly influential. It is both an outlier (large residual) and an influential point (changes the slope significantly).
(c) Leverage and extreme x:
is 40 units beyond the original upper boundary of . High leverage arises because the point is far from of the original data — its -value is extreme. The regression line is “pulled” toward a remote point, giving it enormous control over the slope. This illustrates the general principle: extreme -values always have high leverage, regardless of their residual.
Problem 2 — How Sample Size Affects Significance
Using the formula with , complete the following table. Critical values for , two-tailed, are provided.
()
Significant?
5
3
?
3.182
?
10
8
?
2.306
?
20
18
?
2.101
?
30
28
?
2.048
?
50
48
?
~2.01
?
After completing the table, answer: “r = 0.40 does not become significant until around , yet . What lesson does this teach about statistical significance?”
Show Solution
Computing t for r = 0.40:
Significant?
5
3
0.693
0.76
3.182
No
10
8
1.131
1.23
2.306
No
20
18
1.697
1.85
2.101
No
30
28
2.117
2.31
2.048
Yes
50
48
2.771
3.02
~2.01
Yes
The lesson: requires approximately to become statistically significant at . Yet — the model explains only 16% of the variance in . A “significant” result with is statistically real but practically weak. Large samples make even weak relationships detectable. This is why statistical significance alone is insufficient — always report alongside the -value to convey practical significance.
Problem 3 — What Does Actually Measure?
A regression model with predicts . A student argues: “Since 88% of the variance is explained, my prediction will be within ±12% of the true value.”
Identify two errors in this reasoning. What additional information would be needed to correctly quantify prediction uncertainty for a specific individual?
Show Solution
Error 1 — is not a statement about individual prediction accuracy:
means that 88% of the overall variability in across the dataset is explained by the linear relationship with . It is a population-level measure of fit quality. It does not tell you how close any specific prediction will be to the true value for a given individual.
Error 2 — “Within ±12%” confuses unexplained variance with an error margin:
is the proportion of unexplained variance — not a percentage error margin. Even if only 12% of variance is unexplained, the actual prediction error for an individual depends on the residual standard error, not on the percentage.
What is needed for individual prediction uncertainty:
To build a formal prediction interval for an individual at , you need:
— the residual standard deviation (how spread out individual observations are around the line)
The leverage of (how far is from ) — affects how precisely the mean response is estimated
The critical value for
A 95% prediction interval is substantially wider than a confidence interval for the mean response — it accounts for both estimation uncertainty and natural individual variability. With , the mean response is well estimated, but individual predictions can still vary widely.
Section 10: Solutions Reference
▾
Full worked solutions for all problems in Sections 4–9 are on the solutions page.