A newspaper headline reads: “Study proves social media use lowers grades.” The fine print: the researchers computed (, ). A second research team studying the same question applied a chi-square test to screen time (hours/day) and GPA — two quantitative variables. Their headline: “Screen time and GPA are not independent.”
Both studies made the front page. Both have serious problems.
The first study found a statistically significant correlation — but , meaning social media use explains only 3.2% of the variance in GPA. With 400 participants, even a negligible association reaches significance. The headline says “proves” and “lowers” — neither is justified.
The second study applied the wrong test entirely. Chi-square requires two qualitative (categorical) variables. Screen time in hours and GPA on a numerical scale are both quantitative — chi-square is invalid here.
By the end of this lesson, you will be equipped to spot exactly these errors — and to perform and communicate bivariate analyses correctly.
After this lesson, you will be able to:
By the end of this lesson, you will be able to:
Classify any pair of variables as quantitative or qualitative, and select the correct bivariate analysis method.
Distinguish statistical significance from practical significance, and always pair a p-value with an appropriate effect size ( or Cramér’s V).
Write a complete, accurate research conclusion using proper association language — no causal claims, no magnitude overstatements.
Identify common method-selection and communication errors in published research summaries.
Section 2: Prerequisites
▾
What you need before starting this lesson:
From REG-3 — Regression Interpretation and Prediction: Pearson correlation coefficient (direction, strength, range to ); coefficient of determination (proportion of variance explained); five-step test for H₀: ρ = 0 using , ; correct conclusion language (“sufficient evidence of a [positive/negative] linear relationship”); association vs. causation distinction. You need these because today you will be choosing when to use this test and interpreting its output alongside effect size.
From REG-4 — Chi-Square Test of Independence: Five-step chi-square test; expected frequencies ; test statistic ; degrees of freedom ; conditions (all ); Cramér’s V ; correct conclusion language (“sufficient evidence that [Var 1] and [Var 2] are not independent”); never say “independent” after failing to reject H₀. You need these because today you will be deciding when chi-square is appropriate and reading its effect size.
Quick check before we begin:
Also confirm you remember these key ideas (check each one):
ChiTable reminder — you will need chi-square critical values when evaluating chi-square results in this lesson:
Chi-Square Distribution Table
Critical values (χ²) for given degrees of freedom (df) and upper tail probability (p).
df \ p
0.995
0.99
0.975
0.95
0.90
0.10
0.05
0.025
0.01
0.005
1
0.000
0.000
0.001
0.004
0.016
2.706
3.841
5.024
6.635
7.879
2
0.010
0.020
0.051
0.103
0.211
4.605
5.991
7.378
9.210
10.597
3
0.072
0.115
0.216
0.352
0.584
6.251
7.815
9.348
11.345
12.838
4
0.207
0.297
0.484
0.711
1.064
7.779
9.488
11.143
13.277
14.860
5
0.412
0.554
0.831
1.145
1.610
9.236
11.070
12.833
15.086
16.750
6
0.676
0.872
1.237
1.635
2.204
10.645
12.592
14.449
16.812
18.548
7
0.989
1.239
1.690
2.167
2.833
12.017
14.067
16.013
18.475
20.278
8
1.344
1.646
2.180
2.733
3.490
13.362
15.507
17.535
20.090
21.955
9
1.735
2.088
2.700
3.325
4.168
14.684
16.919
19.023
21.666
23.589
10
2.156
2.558
3.247
3.940
4.865
15.987
18.307
20.483
23.209
25.188
11
2.603
3.053
3.816
4.575
5.578
17.275
19.675
21.920
24.725
26.757
12
3.074
3.571
4.404
5.226
6.304
18.549
21.026
23.337
26.217
28.300
13
3.565
4.107
5.009
5.892
7.042
19.812
22.362
24.736
27.688
29.819
14
4.075
4.660
5.629
6.571
7.790
21.064
23.685
26.119
29.141
31.319
15
4.601
5.229
6.262
7.261
8.547
22.307
24.996
27.488
30.578
32.801
16
5.142
5.812
6.908
7.962
9.312
23.542
26.296
28.845
32.000
34.267
17
5.697
6.408
7.564
8.672
10.085
24.769
27.587
30.191
33.409
35.718
18
6.265
7.015
8.231
9.390
10.865
25.989
28.869
31.526
34.805
37.156
19
6.844
7.633
8.907
10.117
11.651
27.204
30.144
32.852
36.191
38.582
20
7.434
8.260
9.591
10.851
12.443
28.412
31.410
34.170
37.566
39.997
21
8.034
8.897
10.283
11.591
13.240
29.615
32.671
35.479
38.932
41.401
22
8.643
9.542
10.982
12.338
14.041
30.813
33.924
36.781
40.289
42.796
23
9.260
10.196
11.689
13.091
14.848
32.007
35.172
38.076
41.638
44.181
24
9.886
10.856
12.401
13.848
15.659
33.196
36.415
39.364
42.980
45.559
25
10.520
11.524
13.120
14.611
16.473
34.382
37.652
40.646
44.314
46.928
26
11.160
12.198
13.844
15.379
17.292
35.563
38.885
41.923
45.642
48.290
27
11.808
12.879
14.573
16.151
18.114
36.741
40.113
43.195
46.963
49.645
28
12.461
13.565
15.308
16.928
18.939
37.916
41.337
44.461
48.278
50.993
29
13.121
14.256
16.047
17.708
19.768
39.087
42.557
45.722
49.588
52.336
30
13.787
14.953
16.791
18.493
20.599
40.256
43.773
46.979
50.892
53.672
40
20.707
22.164
24.433
26.509
29.051
51.805
55.758
59.342
63.691
66.766
50
27.991
29.707
32.357
34.764
37.689
63.167
67.505
71.420
76.154
79.490
60
35.534
37.485
40.482
43.188
46.459
74.397
79.082
83.298
88.379
91.952
70
43.275
45.442
48.758
51.739
55.329
85.527
90.531
95.023
100.425
104.215
80
51.172
53.540
57.153
60.391
64.278
96.578
101.879
106.629
112.329
116.321
90
59.196
61.754
65.647
69.126
73.291
107.565
113.145
118.136
124.116
128.299
100
67.328
70.065
74.222
77.929
82.358
118.498
124.342
129.561
135.807
140.169
Success Factor:
The Boundary:
The boundary this lesson crosses: In REG-3 and REG-4, you learned how to run each test — the mechanics of computation. Today’s lesson is about when to use each test, how strongly the result matters (effect size), and how to communicate findings without overstating them. The formulas are the same; the judgment layer is new.
Retrieval Warm-up — from earlier lessons
A researcher reports between hours of television watched per day () and reading comprehension score () in a sample of 80 middle-school students. Which statement is the most accurate interpretation?
In REG-3, the significance test for correlation uses with . Using and from the problem above, which conclusion follows at (two-tailed for )?
Section 3: Core Concepts
▾
Ten concepts — here is how they fit together:
C1–C4: Method selection — the decision gate. Classify your variables first; pick your method second.
C5–C6: Effect size — for correlation, Cramér’s V for chi-square. A significant p-value is not enough.
C7: Statistical vs. practical significance — the most commonly misunderstood concept in this module.
C8–C9: Conclusion language — exact wording for each test. One word wrong and the conclusion is misleading.
C10: Communicating findings responsibly — the full checklist for a correct research summary.
C1 — Decision Rule: Variable Types
Before choosing any analysis method, you must classify both variables.
Bivariate Method Decision Rule
Two quantitative variables → Pearson correlation / linear regression
Two qualitative (categorical) variables → Chi-square test of independence
One quantitative + one qualitative → Neither chi-square nor correlation applies directly. A group-comparison method (e.g., ANOVA) is appropriate — but beyond the scope of this course. Flag the mismatch.
This is the primary gate: classify both variables before choosing any method.
Use this interactive flowchart to identify the correct method. Classify both variables, then follow the tree.
Mini-example: A researcher has data on (1) monthly rent paid in dollars and (2) commute time in minutes for 80 renters. Both variables are quantitative — arithmetic differences are meaningful. → Use Pearson correlation.
The same researcher also records each renter’s (1) neighborhood type (urban / suburban / rural) and (2) transport mode (car / transit / bike). Both variables place observations into named categories — no arithmetic is meaningful. → Use chi-square.
C2 — Quantitative Variable Identification
Quantitative Variable
A quantitative variable takes numerical values where arithmetic operations (differences, averages) are meaningful: height in cm, temperature in °C, exam score out of 100, income in dollars, number of injuries. Count variables (number of absences, number of siblings) are quantitative.
Quick test: “Does the average of these values make sense?” If someone averages the exam scores of 30 students to get 74.3 — yes, that’s meaningful. → Quantitative.
A variable stored as a number is not automatically quantitative. Student ID numbers and zip codes are stored numerically — but averaging them produces nonsense. Classification depends on whether arithmetic is meaningful, not whether the data file contains digits.
C3 — Qualitative Variable Identification
Qualitative (Categorical) Variable
A qualitative variable places observations into named categories with no inherent numeric meaning: blood type (A/B/AB/O), grade category (below average/average/above average), political affiliation (Liberal/Conservative/Independent), disease status (diagnosed/not diagnosed). Ordered categories (low/medium/high stress) are still qualitative unless treated numerically.
Quick test: “Do the category labels have a natural numeric difference?” If stress levels are “low”, “medium”, “high” — you cannot compute “high minus low = 2” in any meaningful way. → Qualitative.
Ordinal variables (with a natural order like low/medium/high) are still qualitative unless a researcher explicitly treats the spacing as equal and numerically meaningful. When in doubt in this course, treat ordered categories as qualitative and use chi-square.
C4 — Mixed Variable Types: Out of Scope
Mixed Variable Types
When one variable is quantitative and the other is qualitative, neither chi-square nor Pearson correlation applies directly. Chi-square needs two categorical variables; correlation needs two numerical variables. The appropriate approach is to compare group means (e.g., ANOVA or t-test) — but this is beyond the scope of this course. The correct action here is to flag the mismatch and name the issue.
A common error: a researcher records political affiliation (qualitative) and happiness score on a 1–10 scale (quantitative), then runs a chi-square. This is wrong — chi-square cannot process the numerical happiness scores as categories without losing information and violating the method’s requirements.
C5 — Effect Size for Correlation:
Coefficient of Determination r²
is the proportion of variance in explained by the linear relationship with . A statistically significant (p < α) proves ρ ≠ 0 in the population but does not indicate the model is useful. Always report alongside p.
Thresholds (guidelines, not sharp rules):
< 0.04 → negligible
0.04 ≤ < 0.15 → weak
0.15 ≤ < 0.35 → moderate
≥ 0.35 → strong
Example:, , . This is statistically significant. But — social media explains 3.2% of variance in GPA. The headline “proves social media lowers grades” vastly overstates a negligible practical effect.
does NOT mean the predictor explains 72% of variance — that would confuse with . The actual explained variance is , or 51.8%. Always square to get the proportion of variance explained.
C6 — Effect Size for Chi-Square: Cramér’s V
Cramér's V
where = number of rows, = number of columns. Cramér’s V is sample-size-independent — it measures association strength alone.
Thresholds:
V < 0.1 → negligible
0.1 ≤ V < 0.3 → small
0.3 ≤ V < 0.5 → medium
V ≥ 0.5 → large
Example: A researcher reports (df = 1, , < 0.001) and concludes “enormous chi-square confirms a very strong association.” But V — negligible. The large is driven by the large , not a strong relationship.
depends on — doubling the sample size doubles even if the proportions are unchanged. Never use alone to judge strength of association. Always compute and report Cramér’s V.
C7 — Statistical vs. Practical Significance
Statistical vs. Practical Significance
Statistical significance (p < α): the observed association is unlikely to have arisen from chance alone — we reject H₀. This is a claim about signal vs. noise.
Practical significance (effect size): the association is large enough to matter in the real world. This is a claim about magnitude.
Large makes even negligible associations statistically significant. With , gives < 0.001, yet — the predictor explains 0.25% of variance. The reverse also occurs: a strong effect in a small sample may not reach significance. Always pair p with or V.
“The result is significant, therefore it is important.” This is the most common misuse of statistics in published research. Statistical significance tells you the direction of an effect is reliably non-zero. Effect size tells you whether the effect is worth caring about.
Template: “There is [sufficient / insufficient] evidence of a [positive / negative] linear relationship between [X] and [Y] in the population.”
Always also report: , , -statistic, , and the decision at α.
Never say “X causes Y” based on correlation alone. Correlation tests association only.
“There is sufficient evidence that X affects Y.” — The word “affects” implies causation. Use “is associated with” or “has a linear relationship with.” Causal language requires experimental design with random assignment, not observational data.
C9 — Conclusion Language: Chi-Square
Correct Conclusion Language for Chi-Square Tests
Template: “There is [sufficient / insufficient] evidence that [Variable 1] and [Variable 2] are not independent in the population.”
Always also report: , , V, and the decision at α.
Never say the variables “are independent” after failing to reject H₀ — that would be claiming proof of the null. The correct language is “insufficient evidence to conclude they are not independent.”
“We fail to reject H₀, so the variables are independent.” — Failing to reject H₀ is not the same as proving H₀. We simply lack sufficient evidence of dependence. The variables could be dependent in a way our sample was too small to detect.
C10 — Communicating Findings Responsibly
Checklist: Complete Research Summary
A complete research report includes all seven elements:
Method used and why — name the test and state why the variable types justify it.
Test statistic and p-value — report or , , and .
Decision — “reject H₀” or “fail to reject H₀” at α = [value].
Effect size with interpretation — or V with the appropriate threshold label.
Conclusion in plain language — one sentence accessible to a non-statistician.
Association language only — never “causes”, “proves”, “affects.”
Study limitations — if observational, note that causation cannot be established; if sample size is small, note reduced power.
Section 4: Worked Examples
▾
Example 1 — Fully Worked: Daily Steps and Resting Heart Rate (QQ2)
Scenario: A cardiologist examines whether the number of daily steps is associated with resting heart rate (bpm) for adults. Given: , .
Step 1 — Classify both variables.
I notice both variables are numerical measurements where arithmetic is meaningful: daily steps is a count (quantitative) and resting heart rate in bpm is a continuous measurement (quantitative). Two quantitative variables → Pearson correlation is the appropriate method. Chi-square would be wrong here.
Step 2 — State hypotheses.
H₀: ρ = 0 (no linear relationship between daily steps and resting heart rate in the population).
Hₐ: ρ ≠ 0 (a linear relationship exists in the population). Two-tailed test at α = 0.05.
Step 3 — Check conditions.
We assume this is a random or representative sample and that the relationship is approximately linear in the population. The test is robust for .
Step 4 — Compute the test statistic.
. Critical value: .
Step 5 — Make the decision.
I compare to . Since , I reject H₀ ( < 0.001).
Step 6 — Interpret effect size.
— daily steps explains 37.2% of the variance in resting heart rate. Using the thresholds: 0.35 ≤ → strong practical significance. The association is not only statistically real but practically meaningful.
Step 7 — Write the conclusion.
”There is sufficient evidence of a negative linear relationship between daily steps and resting heart rate in the population (, , < 0.001, ). As daily steps increase, resting heart rate tends to decrease — and this association is of strong practical magnitude.”
Metacognitive note: I noticed the negative before computing anything — this tells me the direction is inverse (more steps → lower heart rate). I chose Pearson correlation because both variables are quantitative, and I reported because a significant p-value alone doesn’t tell us the association is meaningful.
Example 2 — Partially Scaffolded: Stress Level and Sleep Quality (CC5)
Scenario: A sleep researcher surveys 140 adults, recording stress level (low / high) and sleep quality (good / poor). This is pre-classified for you: both variables are qualitative (categorical). The appropriate method is the chi-square test of independence.
Given pre-computed results: , , .
Your task — complete Steps 4 and 5:
Before revealing the decision: Based on and , what is your prediction about the decision at α = 0.05? (Check the ChiTable in Section 2 — .)
Show Solution
Step 4 — Decision: Since , we reject H₀ ( < 0.001).
Step 5 — Write the conclusion:
“There is sufficient evidence that stress level and sleep quality are not independent in the population (, < 0.001, ). Adults with high stress tend to have poorer sleep quality — but with , this is a small-to-moderate association. Other factors likely explain much of the variation in sleep quality.”
Effect size note: falls in the “small” range (0.1–0.3). The association is real but modest — not a large or dominant effect.
Example 3 — Solution Behind Details: TV Hours and GPA (QQ5)
Scenario: A student affairs researcher studies whether watching more TV is associated with lower GPA for students. Given: , , , , .
Before checking the solution: Is this result practically meaningful? looks moderate — but pause before deciding. What does tell you? Would you recommend this finding to a school board as evidence that TV limits should be enforced?
Show Solution
Variable classification: Both variables are quantitative (hours/week and GPA on a 0–4 scale). → Pearson correlation is correct.
Decision: → reject H₀ ().
Conclusion: “There is sufficient evidence of a negative linear relationship between weekly TV hours and GPA in the population (, , , ). More TV hours are associated with lower GPA in this population.”
Effect size interpretation: — TV hours explains only 14.4% of the variance in GPA. This falls in the “weak” range (0.04–0.15). The result is statistically significant but of weak practical significance.
Key teaching point: This result is statistically real — but a school board should not implement TV restrictions based on this alone. An of 0.144 means 85.6% of GPA variation is explained by other factors. The effect exists, but it is small relative to the complexity of academic performance.
Example 4 — Find the Error: Chi-Square Applied to Quantitative Variables
A researcher’s report:
“We collected data on hours of sleep per night (ranging from 4 to 10 hours, measured to the nearest half-hour) and GPA (0–4 scale, measured to two decimal places) for 45 university students. We ran a chi-square test of independence and found χ² = 18.4 (df = 12, p = 0.048), concluding: ‘Sleep hours and GPA are not independent in the population — students who sleep more tend to have higher GPAs.’”
Identify the errors. There are at least two.
Show Solution
Error 1 — Wrong method: Both variables are quantitative. Sleep hours (measured to the nearest half-hour) and GPA (measured to two decimal places) are continuous numerical measurements where arithmetic differences are meaningful. Chi-square requires two qualitative (categorical) variables. The appropriate method is Pearson correlation (or linear regression).
Error 2 — The chi-square output is meaningless in this context: When chi-square is applied to quantitative data, the researcher must first have binned or categorized the data (creating artificial categories). This destroys information and introduces arbitrary cut-points. The resulting value does not test anything meaningful about the original continuous relationship.
Error 3 (bonus) — Conclusion language: “Students who sleep more tend to have higher GPAs” is an interpretive statement that goes beyond what the (incorrectly applied) test can establish. Even if correlation were run correctly, association language must be used: “sleep hours are associated with GPA.”
What the researcher should have done: Compute Pearson between sleep hours and GPA. Report , , the -statistic, , and . Use the conclusion template from C8.
Section 5: Guided Practice
▾
Problem 1 — Method Selection
For each scenario, classify both variables and select the appropriate analysis method.
Scenario: “A professor collects data on weekly study hours and final exam scores for 25 students to investigate whether more studying leads to higher scores.”
(a) What type of variable is “weekly study hours”?
(b) What type of variable is “final exam score”?
(c) Which analysis method is appropriate?
Show Solution
(a) Quantitative — weekly study hours is a count/continuous measurement.
(b) Quantitative — exam score is a numerical measurement on a defined scale.
(c) Pearson correlation / linear regression — two quantitative variables.
Scenario: “A health researcher surveys 100 adults, recording whether each is a smoker or non-smoker, and whether each exercises regularly or irregularly. They want to know if smoking status and exercise habits are related.”
(a) What type of variable is “smoking status”?
(b) What type of variable is “exercise frequency”?
(c) Which analysis method is appropriate?
Show Solution
(a) Qualitative — smoking status is a binary category.
(b) Qualitative — exercise frequency is a binary category (regular/irregular).
(c) Chi-square test of independence — two qualitative variables.
Scenario: “A sports scientist records the average weekly training hours and the number of sports injuries per season for 35 athletes.”
(a) What type of variable is “weekly training hours”?
(b) What type of variable is “number of sports injuries per season”?
(c) Which analysis method is appropriate?
Show Solution
(a) Quantitative — continuous measurement.
(b) Quantitative — count variable (discrete but quantitative).
(c) Pearson correlation / linear regression — two quantitative variables.
Scenario: “A marketing team surveys 200 customers, recording their preferred smartphone brand (Brand A / Brand B / Brand C) and their age group (18–30 / 31–50 / 51+). They want to test whether brand preference depends on age.”
(a) What type of variable is “preferred smartphone brand”?
(b) What type of variable is “age group”?
(c) Which analysis method is appropriate?
Show Solution
(a) Qualitative — brand names are named categories.
(b) Qualitative — age is grouped into named brackets (not individual numerical values).
(c) Chi-square test of independence — two qualitative variables.
Scenario: “A nutritionist measures daily caloric intake (kcal) and body weight (kg) for 45 adults to determine if higher caloric intake is associated with greater body weight.”
(a) What type of variable is “daily caloric intake (kcal)”?
(b) What type of variable is “body weight (kg)”?
(c) Which analysis method is appropriate?
Show Solution
(a) Quantitative — continuous measurement in kcal.
(b) Quantitative — continuous measurement in kg.
(c) Pearson correlation / linear regression — two quantitative variables.
Problem 2 — Interpreting Given Results
For each pre-computed result, select the correct conclusion and effect size interpretation.
Results from a study of weekly study hours and final exam scores ():, , , reject H₀ ( < 0.001).
(a) Which statement correctly concludes this test?
(b) Which statement correctly interprets the effect size?
Show Solution
(a) “There is sufficient evidence of a positive linear relationship between study hours and exam scores in the population.” — Correct association language, no causal claim, reports the direction.
(b) r² = 0.518 — study hours explains 51.8% of the variance in exam scores. Strong practical effect. — r² ≥ 0.35 → strong. This is both statistically and practically significant.
Results from a study of smoking status and exercise frequency ():, , reject H₀ ().
(a) Which statement correctly concludes this test?
(b) Which statement correctly interprets the effect size?
Show Solution
(a) “There is sufficient evidence that smoking status and exercise frequency are not independent in the population.” — Correct chi-square conclusion language.
(b) V = 0.204 → small association. Statistically significant but modest practical magnitude.
Results from a study of weekly TV hours and GPA ():, , , reject H₀ ().
(a) Which statement correctly concludes this test?
(b) Which statement correctly interprets the effect size?
Show Solution
(a) Correct — negative relationship, sufficient evidence, proper association language.
(b) Correct — r² = 0.144 falls in the weak range (0.04–0.15). The result is statistically significant but of weak practical importance.
Results from a study of education level and daily newspaper reading ():, fail to reject H₀ ().
(a) Which statement correctly concludes this test?
(b) What is the correct interpretation when we fail to reject H₀?
Show Solution
(a) Correct — “insufficient evidence … are not independent” — proper language for failing to reject H₀ in chi-square.
(b) Correct — we do not claim independence; we only report insufficient evidence of dependence.
Results from a study of commute time and work satisfaction ():, , , fail to reject H₀ ().
(a) Which statement correctly concludes this test?
(b) Which statement correctly interprets r² = 0.048?
Show Solution
(a) Correct — “insufficient evidence of a linear relationship” — proper language for failing to reject H₀ in a correlation test.
(b) Correct — r² = 0.048 is weak (0.04–0.15 range), even if the result were significant.
Problem 3 — Evaluate Two Studies
Read the two research summaries carefully and answer the questions.
Study A: “We measured GPA (0–4 scale) and weekly shoe-shopping frequency for 200 students. We found , . We conclude there is insufficient evidence of a relationship between GPA and shoe-shopping, so students who buy more shoes must have the same GPA as others.”
Study B: “We surveyed 500 employees on their department (Sales / Marketing / Engineering / HR) and their self-rated work-life balance (1–10 scale, numerical). We used chi-square and found (, ), concluding the variables have a strong association.”
(a) What is the primary error in Study A’s conclusion?
(b) What is the primary error in Study B’s analysis?
Show Solution
(a) Study A’s error: “must have the same GPA as others” claims proof of no relationship — this is the “accept H₀” fallacy. Failing to reject H₀ means only that we lack sufficient evidence of a relationship. The variables could be related in a way the sample was too small to detect (note: n = 200 with r = 0.13 would give a borderline result; a larger sample might find significance).
(b) Study B’s error: Work-life balance on a 1–10 numerical scale is quantitative. Chi-square requires two categorical variables. The researcher should use Pearson correlation between department (which is qualitative — this actually creates a mixed-type problem!) or, more precisely, a group-comparison method (ANOVA comparing mean work-life balance across departments). Simply put: a 1–10 rating scale is quantitative, and chi-square is not valid for it.
Problem 4 — Full Bivariate Scenario
Section 6: Independent Practice
▾
Problem 1 — Full Analysis Chain
Given pre-computed results, work through the full analysis: method selection, decision, conclusion, and effect size classification.
Scenario: A professor studies whether weekly study hours () predict exam scores () for students. Results: , , .
(a) Which method is appropriate?
(b) What is the decision at α = 0.05? ()
(c) Which conclusion is correct?
(d) What is the correct effect size classification?
Show Solution
(a) Pearson correlation — two quantitative variables.
(b) Reject H₀ — |t| = 4.975 > t*(23) = 2.069.
(c) “There is sufficient evidence of a positive linear relationship between study hours and exam scores in the population.”
(d) r² = 0.518 → strong practical significance.
Scenario: A researcher tests whether study method (self / group) and grade category (below avg / average / above avg) are independent for students. Results: , .
(a) Which method is appropriate?
(b) What is the decision at α = 0.05? ()
(c) Which conclusion is correct?
(d) What is the correct effect size classification?
Show Solution
(a) Chi-square — two qualitative variables.
(b) Reject H₀ — 16.333 > 5.991.
(c) “There is sufficient evidence that study method and grade category are not independent in the population.”
(d) V = 0.369 → medium effect.
Scenario: A financial planner investigates whether higher monthly spending () is associated with lower monthly savings () for clients. Results: , , .
(a) Which method is appropriate?
(b) What is the decision at α = 0.05? ()
(c) Which conclusion is correct?
(d) What is the correct effect size classification?
Show Solution
(a) Pearson correlation — two quantitative variables.
(b) Reject H₀ — |t| = 3.373 > t*(38) = 2.024.
(c) “There is sufficient evidence of a negative linear relationship between monthly spending and monthly savings in the population.”
(d) r² = 0.230 → moderate practical significance.
Scenario: A public health researcher tests whether age group (under 40 / 40+) and vaccine uptake (vaccinated / not vaccinated) are independent for adults. Results: , .
(a) Which method is appropriate?
(b) What is the decision at α = 0.05? ()
(c) Which conclusion is correct?
(d) What is the correct effect size classification?
Show Solution
(a) Chi-square — two qualitative variables (both are binary categorical).
(b) Reject H₀ — 8.081 > 3.841.
(c) “There is sufficient evidence that age group and vaccine uptake are not independent in the population.”
(d) V = 0.201 → small effect (statistically significant, modest magnitude).
Scenario: A researcher studies whether weekly TV hours () is associated with GPA () for students. Results: , , .
(a) Which method is appropriate?
(b) What is the decision at α = 0.05? ()
(c) Which conclusion is correct?
(d) What is the correct effect size classification?
Show Solution
(a) Pearson correlation — two quantitative variables.
(b) Reject H₀ — |t| = 3.128 > t*(58) = 2.002.
(c) “There is sufficient evidence of a negative linear relationship between weekly TV hours and GPA in the population.”
(d) r² = 0.144 → statistically significant but weak practical significance. The key teaching point: significant ≠ practically important.
Problem 2 — Statistical vs. Practical Significance
Problem 3 — Find the Error
Each scenario contains a flawed researcher statement. Identify the specific error.
Scenario: “A researcher encodes study method as numeric: self-study = 1, group study = 2. They compute Pearson between this encoded variable and GPA (0–4 scale). They find , < 0.05, and conclude there is a significant positive linear relationship between study method and GPA.”
What is the error in this analysis?
Show Solution
Study method (self-study/group study) is a nominal qualitative variable. Assigning 1 and 2 does not create numerical meaning — you cannot meaningfully compute the average of “self-study + group study / 2.” The correct method is chi-square (comparing grade categories across study method groups). Applying Pearson r produces a meaningless coefficient.
Scenario: “A researcher finds that ice cream sales and drowning deaths are positively correlated (, < 0.001). They conclude: ‘Ice cream consumption causes drowning.’”
What is the error in this conclusion?
Show Solution
A classic lurking variable (confounding) example. Summer heat causes both increased ice cream consumption and increased swimming (hence more drowning). The correlation is real — but caused by a third variable, not by ice cream causing drowning. Association never establishes causation: r proves ρ ≠ 0 in the population; it reveals nothing about mechanism or direction of causation.
Scenario: “A nutritionist has data on daily caloric intake (kcal) and resting metabolic rate (kcal/day) for 80 participants. She runs a chi-square test and reports (, ), concluding ‘caloric intake and metabolic rate are not independent.’”
What is the error in this analysis?
Show Solution
Caloric intake in kcal and resting metabolic rate in kcal/day are both continuous quantitative measurements. Chi-square requires two qualitative (categorical) variables. The nutritionist has likely binned the data into artificial categories to run chi-square — a practice that discards information and introduces arbitrary cut-points. Pearson correlation is the correct method.
Scenario: “A researcher runs a chi-square test on a 2×2 table with and reports (, < 0.001). They conclude: ‘This enormous chi-square value confirms that the two variables have a very strong association.’”
What is the error in this conclusion?
Show Solution
(approximately). With , even (negligible association) gives . Always compute Cramér’s V to assess strength independently of . Here — a negligible effect, not a “very strong association.”
Scenario: “A tech company analyzes data from employees and finds that the correlation between work-from-home days per week and productivity score is (). The VP concludes: ‘This significant correlation proves that working from home substantially boosts productivity.’”
What is the error in this conclusion?
Show Solution
With , the critical is approximately 1.960. , well above 1.960 — statistically significant. But : work-from-home explains 0.25% of variance in productivity. The VP is confusing the presence of an effect with the importance of an effect. “Substantially boosts” requires a large practical effect, which r² = 0.0025 definitively contradicts.
Problem 4 — Communication Task
Problem 5 — Synthesis: Diabetes and Health Outcomes
A public health research team investigates two questions about diabetes in a random sample of 500 adults.
Question A: Is fasting blood glucose (mg/dL) related to BMI (kg/m²)?
Given: , ,
Question B: Is diabetes diagnosis (Non-diabetic / Pre-diabetic / Diabetic) related to physical activity level (Active / Sedentary)?
Observed contingency table:
Active
Sedentary
Row total
Non-diabetic
165
85
250
Pre-diabetic
55
95
150
Diabetic
30
70
100
Col total
250
250
500
(a) For Question A, justify why Pearson correlation is the appropriate method.
(b) For Question A, compute (given , ) and interpret .
(c) For Question B, justify why chi-square is appropriate.
(d) For Question B, verify all conditions, compute , state the decision (), and compute Cramér’s V.
(e) A journalist reports: “The study proves that being sedentary causes diabetes.” Identify two statistical reasoning errors.
(f) A hospital administrator says: ” is small — BMI is useless for predicting blood glucose.” Evaluate this claim.
Show Solution
(a) Both variables are quantitative: fasting blood glucose in mg/dL and BMI in kg/m² are continuous numerical measurements where arithmetic (differences, averages) is meaningful. Chi-square requires two qualitative (categorical) variables — it is not applicable here.
(b).
Since , we reject H₀ ( < 0.001).
— BMI explains 33.6% of the variance in fasting blood glucose. This is a moderate-to-strong practical effect ( is moderate, approaching strong at ).
(c) Both variables are qualitative: diabetes diagnosis has three named categories (Non-diabetic / Pre-diabetic / Diabetic); physical activity level has two named categories (Active / Sedentary). No arithmetic is meaningful on these category names. Two qualitative variables → chi-square is appropriate.
(d) Expected frequencies: ; ; ; ; ; . All ✓. , .
.
Since , reject H₀ ( < 0.001).
— medium effect.
(e) Error 1 — Causation: chi-square tests association, not causation. This is observational data; we cannot conclude that being sedentary causes diabetes. Error 2 — Reverse causation: people with diabetes or pre-diabetes may already be sedentary due to their condition, not the other way around. The data cannot distinguish the direction of any causal relationship.
(f) The administrator is wrong to call it “useless.” is a moderate-to-strong effect — BMI explains 33.6% of fasting glucose variance. In medical research involving complex physiological systems, a single variable explaining one-third of the variance is clinically meaningful. The administrator may be applying an overly strict standard, or confusing individual predictive precision (which is affected by all unexplained variance) with the relevance of BMI as a risk indicator. No single measurement explains everything in health research — the 66.4% unexplained variance reflects the complexity of blood glucose regulation, not a failure of BMI.
Mixed Review — Retrieval from Earlier Lessons
These problems draw on concepts from earlier in the course. Attempting them without re-reading prior lessons is the point — retrieval practice strengthens long-term memory more than re-reading.
Review Problem 1 — Regression Prediction and Extrapolation (REG-3)
A civil engineer models the relationship between the age of residential water pipes (, years) and average lead concentration in tap water (, μg/L). From data on 30 city blocks she fits:
(a) Interpret the slope in context.
(b) Predict lead concentration for a block with pipes aged 35 years. Is this interpolation or extrapolation?
(c) A city planner asks for a prediction at years (pipes scheduled for replacement). Explain why this prediction is unreliable.
(d) A journalist writes: “Older pipes cause higher lead levels.” Evaluate this claim statistically.
Show Solution
(a) For each additional year of pipe age, the predicted average lead concentration in tap water increases by 0.45 μg/L, on average.
The slope is positive: older pipes are associated with higher lead concentrations. This aligns with the physical mechanism of pipe corrosion, though the regression alone cannot establish causation.
(b) μg/L
years falls within the observed range , so this is interpolation — a relatively safe prediction. (It is near the upper end, so some caution is warranted, but it does not extend beyond the data.)
(c) years is well beyond the maximum observed value of 40 years — this is extrapolation. Two reasons this prediction is unreliable:
Linearity may not hold. Beyond 40 years, pipe degradation may accelerate non-linearly, or pipes may have been partially replaced/relined, breaking the linear trend.
No supporting data. The model was fit only on blocks with pipes aged 5–40 years. We have no empirical information about the relationship at — we would be projecting the trend into uncharted territory.
The model gives μg/L, but this value should not be reported as a reliable prediction.
(d) The regression is consistent with a positive association, but “older pipes cause higher lead levels” requires caution:
The study is observational (no random assignment of pipe ages). Confounders such as neighbourhood age, water acidity, or pipe material type (not all old pipes are lead pipes) could explain the association.
The causal mechanism is physically plausible, but the regression alone cannot distinguish correlation from causation.
A more precise statement: “There is a statistically significant positive linear association (, ) between pipe age and lead concentration — the data are consistent with, but do not prove, a causal relationship.”
Review Problem 2 — Chi-Square Test of Independence (REG-4)
An occupational health researcher collects data on 200 office workers and records their seating arrangement (open-plan vs. private office) and reported productivity level (Low, Medium, High). The observed contingency table is:
Low
Medium
High
Row Total
Open-plan
40
50
30
120
Private
20
40
20
80
Col Total
60
90
50
200
(a) Compute the expected frequency for the Open-plan / Medium cell.
(b) Verify the conditions for a chi-square test of independence.
(c) State and for this test.
(d) The computed test statistic is , , . State the decision and conclusion in context.
(e) A manager concludes: “Open-plan offices therefore do not affect productivity.” Identify the statistical reasoning error.
Show Solution
(a)
(b) Computing all expected frequencies:
Low
Medium
High
Open-plan
Private
All six expected frequencies are ≥ 5 ✓. The condition for chi-square is satisfied.
(c)
(d) Since , we fail to reject ().
Conclusion: “There is insufficient evidence at the 0.05 significance level that seating arrangement and productivity level are associated in this population of office workers.”
(e) The manager’s error is confusing “fail to reject ” with “accept .” A non-significant result does not prove the variables are independent — it only means the data did not provide sufficient evidence of dependence at this sample size. The true association may exist but be too small to detect with , or the test may lack sufficient power. The manager should say “we found no significant evidence of an association” rather than “there is no effect.”
Section 7: Mastery Check
▾
Question 1 — Feynman Test
Explain in plain language — as if to a classmate who has not taken statistics — why a statistically significant result is not always practically important. Use a specific example in your explanation.
0 / 500
Show Model Answer
A statistically significant result means the data are unlikely to have occurred by chance if there were truly no relationship in the population — we have reliable evidence that an effect exists. But “exists” and “matters” are different things.
Imagine you study 10,000 employees and find that working from home one extra day per week is correlated with a small increase in productivity. With such a huge sample, even a tiny effect — say, — will be statistically significant. But : work-from-home explains 0.25% of the variation in productivity. The remaining 99.75% of productivity differences come from other sources entirely. The effect is real, but it is so small that no rational company would restructure their entire workplace policy based on it.
Statistical significance tells you the signal is reliably non-zero. Effect size ( or Cramér’s V) tells you how big the signal is. Both pieces of information are needed before drawing conclusions about importance.
Question 2 — Apply Question: A New Educational Study
A school board is considering a new reading program. A researcher collects data from students and finds that students who completed the program have significantly higher reading scores than those who did not. The analysis reports , , , .
(a) What type of test was likely run, and were both variables quantitative?
(b) The school board sees p = 0.002 and says “Proven — the program works.” What is wrong with this statement?
Show Solution
(a) The researcher appears to have treated program completion (yes/no) as a numerical variable (likely coded 0/1) and computed Pearson r. This is technically a point-biserial correlation — valid in some contexts, but program completion is fundamentally qualitative (binary categorical). A better analysis for “do the two groups differ?” would be an independent-samples t-test comparing mean reading scores across the two groups.
(b) Two errors: (1) p = 0.002 does not “prove” the program works — it provides evidence against H₀ (no relationship). Statistics does not prove; it provides evidence. (2) r² = 0.123 is in the weak range (0.04–0.15): the program explains only 12.3% of the variance in reading scores. This may not justify the cost of a new program — other factors account for 87.7% of score variation.
Question 3 — Error Analysis
A researcher’s published conclusion: “Our chi-square test found a significant association between hours of sleep per night and academic performance score (, , ). We conclude that insufficient sleep causes poor academic performance.”
Identify the primary error in this conclusion.
Show Solution
Error 1 — Wrong method: Hours of sleep per night is a continuous quantitative variable (3.5 hours, 7.0 hours, 8.5 hours — arithmetic differences are meaningful). Academic performance score is likely also quantitative. Chi-square requires two qualitative (categorical) variables. The researcher apparently binned the data into categories to run chi-square — this discards information and produces arbitrary results depending on the binning choice. Pearson correlation is the appropriate method.
Error 2 — Causal language: “Causes poor academic performance” is not justified by any bivariate analysis, including correlation. Even if correlation were correctly applied, it would show association only. Causal claims require controlled experimental design with random assignment. From observational data, we can only say “is associated with.”
Self-Assessment
How confident do you feel about this lesson’s core skills?
Still confusedReady for the Boss Fight
Section 8: Boss Fight
▾
Choose your path. Both paths test everything from this lesson — pick the one that matches your strengths.
🔬 Path A — The Analyst
You receive a real dataset and must perform a complete bivariate analysis from scratch: classify variables, choose the method, compute the test statistic, interpret the effect size, and write a research conclusion.
📝 Path B — The Communicator
You receive three flawed research reports. For each one, you must identify the specific error and rewrite the conclusion using correct statistical language.
Path A — The Analyst
A school board wants to know if reading speed (words/minute) is related to reading comprehension score (out of 100) for students. Given: , .
Task 1. Classify both variables and justify your choice of analysis method.
Reading speed in words/minute and comprehension score out of 100 are both _____ variables because ___________. The appropriate method is _________.
Show Answer
Both are quantitative — they are numerical measurements where arithmetic differences are meaningful. Reading speed in wpm and comprehension score on a 0–100 scale both have meaningful averages and differences. → Pearson correlation / linear regression is appropriate.
Task 2. Compute the test statistic. Given , . Make the decision at α = 0.05.
Show Answer
→ Reject H₀ ( < 0.001).
Task 3. Interpret . Is this a weak, moderate, or strong practical effect? Justify using the thresholds from Section 3.
Show Answer
→ Strong practical significance. Reading speed explains 43.6% of the variance in reading comprehension score — a strong practical effect. This suggests reading speed is an important predictor of comprehension, though 56.4% of variance remains unexplained by speed alone.
Task 4. Write a two-sentence research conclusion using correct association language. Include: the direction, , , the decision, and the effect size classification. Do not use causal language.
0 / 500
Show Model Answer
“There is sufficient evidence of a positive linear relationship between reading speed and reading comprehension score in the population (, , < 0.001). Reading speed explains 43.6% of the variance in comprehension scores (), indicating a strong practical association — but this is observational data, and we cannot conclude that faster reading causes higher comprehension.”
Reflection: What was the most challenging part of this analysis? Was it the initial setup or the final interpretation?
Path B — The Communicator
You are reviewing three research summaries submitted to a school newsletter. Each has a problem. Identify the specific error in each report and write a corrected conclusion.
Report 1 — Nutrition Study:
“We measured diet quality score (0–100 scale) and athletic performance score (0–100 scale) for 60 student athletes. We found , < 0.001. Conclusion: A healthy diet causes significantly improved athletic performance. We recommend all athletes immediately overhaul their diets.”
Task 1. Identify the specific error and write a corrected two-sentence conclusion.
Show Answer
Error: “Causes” is a causal claim — unjustified from correlation data, which shows association only. Additionally, “immediately overhaul” overstates what a single observational study supports.
Corrected conclusion: “There is sufficient evidence of a positive linear relationship between diet quality score and athletic performance score in the population (, < 0.001, , moderate-to-strong practical effect). These results are from observational data and show an association — they do not establish that diet causes improved performance. Other factors (training hours, sleep, genetics) may contribute to the relationship.”
Report 2 — Social Media Study:
“We surveyed 500 students and found a statistically significant association between social media use (hours/week) and GPA. , . Conclusion: The research conclusively demonstrates a moderate relationship between social media and academic performance. Schools should limit social media access immediately.”
Task 2. Identify the specific error(s) and write a corrected two-sentence conclusion.
Show Answer
Errors: (1) “Moderate relationship” is wrong — (under 2% of variance explained) is negligible, not moderate. was likely confused with . (2) “Conclusively demonstrates” overstates what a p-value shows. (3) The policy recommendation ignores effect size.
Corrected conclusion: “There is sufficient evidence of a positive linear relationship between social media hours and GPA in the population (, ) — however, indicates that social media use explains less than 2% of variance in GPA. This is a statistically significant but negligible practical effect; the data do not support restricting social media access based on this association alone.”
Report 3 — Brand Preference Study:
“We surveyed 300 shoppers on preferred clothing brand (Brand X / Brand Y / Brand Z) and age group (18–25 / 26–40 / 41+). We computed Pearson , . Conclusion: There is sufficient evidence of a moderate positive linear relationship between clothing brand preference and age group.”
Task 3. Identify the specific error and write a corrected analysis description.
Show Answer
Error: Both variables are qualitative (categorical) — brand names (Brand X/Y/Z) and age groups (18–25/26–40/41+) are named categories, not quantitative measurements. Pearson correlation requires two quantitative variables. The correct method is chi-square test of independence.
Corrected description: “Both preferred clothing brand and age group are qualitative variables — named categories with no meaningful arithmetic. Pearson correlation is not appropriate. A chi-square test of independence should be run on the contingency table. The reported is meaningless in this context. The research team should reanalyze using chi-square and report , , , and Cramér’s V.”
Reflection: What was the most challenging part of this analysis? How would you apply this design approach to another problem?
Section 9: Challenge Problems
▾
Ready for more? These go beyond the lesson objectives.
Challenge 1 — The ANOVA Gap: Mixed Variable Types
A researcher surveys 100 university students on their political affiliation (Liberal / Conservative / Independent) and their self-reported happiness on a 1–10 scale. They want to test whether political affiliation is related to happiness.
(a) Can chi-square be applied? Explain.
(b) Can Pearson correlation be applied? Explain.
(c) What analysis approach would be appropriate? (Name it, even though it is beyond this course.)
(d) What general principle does this illustrate about method selection?
Show Solution
(a) No — happiness on a 1–10 scale is quantitative (numerical ratings where arithmetic differences are meaningful). Chi-square requires both variables to be qualitative (categorical). Applying chi-square here would require artificially binning the happiness scores, discarding information.
(b) No — political affiliation (Liberal / Conservative / Independent) is qualitative (nominal categories). There is no numerical meaning to political affiliation labels; you cannot meaningfully compute the average of “Liberal + Conservative.” Pearson correlation requires both variables to be quantitative.
(c) The appropriate approach is a group-comparison method: specifically, one-way ANOVA (Analysis of Variance) if normality and equal-variance assumptions are met, or the Kruskal-Wallis test (non-parametric alternative). The research question becomes: “Does mean happiness differ across the three political affiliation groups?” This compares group means on the quantitative variable, rather than testing association between two variables of the same type.
(d) Always identify both variable types before choosing a method. When one variable is qualitative and one is quantitative, neither chi-square nor Pearson correlation applies. A group-comparison method is needed — or the question must be reframed. Method selection is driven entirely by variable types, not by the research topic.
Challenge 2 — Sample Size and the p-value Trap
A nutritionist runs two separate studies on the relationship between diet quality score (quantitative) and inflammatory marker CRP in mg/L (quantitative).
Study A:, ,
Study B:, ,
Pre-verified computations:
Study A: , , → reject H₀ ()
Study B: , , → reject H₀ ()
(a) Are both studies statistically significant at α = 0.05?
(b) For which study is there stronger practical evidence that diet quality is meaningfully related to CRP?
(c) What does this illustrate about using p-values alone to compare studies?
(d) If a nutritionist wanted to recommend diet coaching based on these results, which study better supports the recommendation? Why?
Show Solution
(a) Yes — both studies reject H₀ at α = 0.05 (|t| > t* in both cases).
(b) Study A provides stronger practical evidence: (moderate — diet quality explains 27% of variance in CRP) versus Study B’s (negligible — only 3.2% explained). Despite Study B having a smaller p-value, its practical effect is much weaker.
(c) p-values are heavily influenced by sample size. With , even negligible associations produce significant t-statistics. Two studies can have nearly identical p-values (0.019 vs. 0.011) while having wildly different practical significance ( vs. ). Never use p-values to compare study importance — always report and compare effect sizes.
(d) Study A — indicates diet quality explains a meaningful portion of CRP variance. This is a moderate practical effect with clinical plausibility. Study B’s (explaining only 3.2% of variance in CRP with ) provides very weak basis for individual-level interventions like diet coaching. A nutritionist should always examine both statistical and practical significance before making recommendations.
Challenge 3 — Write a Full Research Report
Using the results from Section 6 Problem 5 (the diabetes study):
Write a two-paragraph research report following the C10 checklist from Section 3:
Paragraph 1: Question A (BMI vs. fasting glucose) — method, , , , , -value, decision, conclusion, effect size, study limitation
Paragraph 2: Question B (diabetes status vs. activity level) — method, , , V, -value, decision, conclusion, V interpretation, study limitation
Both paragraphs: association language only (no causal claims), note observational design
Show Model Answer
Paragraph 1: A Pearson correlation was computed to assess the linear relationship between BMI (kg/m²) and fasting blood glucose (mg/dL) in a random sample of 500 adults. The analysis revealed a significant positive relationship, , , < 0.001. BMI explains 33.6% of the variance in fasting blood glucose (), indicating a moderate-to-strong practical association. Adults with higher BMI tend to show higher fasting blood glucose in this sample. Because this is an observational study, causal conclusions cannot be drawn; other factors (diet, physical activity, genetics) likely contribute to blood glucose levels.
Paragraph 2: A chi-square test of independence was used to determine whether diabetes diagnosis (Non-diabetic / Pre-diabetic / Diabetic) and physical activity level (Active / Sedentary) are independent in the population of adults. The test was significant, , < 0.001. Cramér’s indicates a medium association — sedentary adults are proportionally more represented among pre-diabetic and diabetic groups. Because this is observational data, the direction of any causal relationship — whether inactivity is associated with diabetes risk, or whether health declines from diabetes reduce physical activity — cannot be determined from these analyses alone.