A public health team collects vaccination data from 200 adults and organizes it into a simple table:
Vaccinated
Not vaccinated
Row total
Under 40
45
55
100
40 and over
65
35
100
Col total
110
90
200
Older adults appear more likely to be vaccinated — 65% versus 45%. But before recommending any policy, the team needs to answer a harder question: Are older adults genuinely more likely to be vaccinated in the population, or could a difference this large arise from chance in a random sample of 200?
The proportions are different. But are they statistically different? This is exactly the question the chi-square test of independence answers.
The test works by comparing what we observed to what we would expect if the two variables were completely unrelated — if vaccination status and age group were independent in the population. A large discrepancy between observed and expected counts produces a large test statistic; a small discrepancy produces a small one. The p-value then tells us whether that discrepancy is surprising enough to reject independence.
After this lesson, you will be able to:
By the end of this lesson, you will be able to:
Construct a contingency table and compute expected frequencies under the null hypothesis of independence.
Compute the chi-square test statistic and determine degrees of freedom.
Check the conditions for the chi-square test and explain what to do when they are violated.
Perform the complete five-step test, using the chi-square table to find critical values.
Compute Cramér’s V to quantify the strength of association.
Distinguish statistical association from causation.
Section 2: Prerequisites
▾
This lesson builds directly on the five-step hypothesis-testing framework from INF-5. Make sure you have these tools ready.
The five-step framework (from INF-5):
Step
What you do
1
State H₀ and Hₐ
2
Check conditions
3
Compute the test statistic
4
Find the p-value (or compare to critical value)
5
State the decision and conclusion in context
P-value decision rule: Reject H₀ if . At , this means we need the data to be sufficiently unlikely under the assumption of independence.
Fail to reject ≠ accept H₀. A large p-value means “not enough evidence to conclude the variables are dependent” — it does not prove they are independent. We cannot prove the null hypothesis.
Type I and Type II errors:
Type I: Rejecting H₀ when it is actually true (false positive). Probability = .
Type II: Failing to reject H₀ when it is actually false (false negative). Probability = .
Success Factor:
Key Difference:
Key difference from INF-5: The chi-square test for independence is always right-tailed only. Unlike the t-test for a mean (which can be two-tailed), the chi-square statistic is always ≥ 0, and a large positive value is the only kind of extreme outcome. We only look at the right tail of the chi-square distribution.
Retrieval Warm-up — from earlier lessons
In a study of 40 university students, a researcher tests whether mean daily screen time differs from a national benchmark of 6.5 hours. She collects a sample with h and h. She has no prior hypothesis about the direction of difference. Which null and alternative hypotheses are correct?
A researcher uses INF-6 to test whether the proportion of patients who experience side effects differs from 0.20. She obtains from a sample of . She computes the test statistic as . What error has she made?
Section 3: Core Concepts
▾
C1 — Contingency Table Structure
A contingency table (also called a cross-tabulation or two-way table) organizes data on two categorical variables. Rows represent levels of one variable; columns represent levels of the other. Cells contain observed frequencies (O) — the actual counts from the sample.
Contingency Table
A two-way table with rows and columns. Each cell contains the observed frequency O for that combination of categories. The row totals, column totals, and grand total form the margins.
Example — 3×2 contingency table structure (r = 3, c = 2):
Category A
Category B
Row total
Group 1
Group 2
Group 3
Col total
C2 — Expected Frequencies
Expected Frequency
The count we would predict in a cell if the two variables were completely independent in the population. For any cell in row , column :
Common error: Some students compute — adding instead of multiplying. This is wrong. The formula uses multiplication, not addition, because independence means the joint probability equals the product of marginal probabilities.
Common error: Some students write — dividing by twice. The correct formula divides by once.
Verifying expected frequencies: The row totals and column totals of the expected frequency table must equal the row and column totals of the observed table. This is a useful self-check.
C3 — Chi-Square Test Statistic
Chi-Square Test Statistic
Summed over all cells. Measures the total discrepancy between observed and expected counts. always; equality holds only when O = E in every single cell.
Intuition: Each cell contributes to the sum — the squared deviation from what independence predicts, scaled by the expected count (larger cells shouldn’t dominate just because they’re larger). A large means the observed data deviates strongly from what independence would predict.
Common error: Using (forgetting to square the numerator) or (dividing by observed instead of expected). The denominator must be — it is the reference under H₀.
The visualization above shows how reflects the visual difference between expected and observed distributions. When the two panels look nearly identical, is small; when they differ substantially, is large.
C4 — Degrees of Freedom
Degrees of Freedom for Chi-Square Test of Independence
where = number of rows, = number of columns.
Examples:
table:
table:
table:
Common error: Using (borrowed from the t-test for a mean) or . For the chi-square test of independence, the correct formula is always .
Higher df means a larger critical value is needed to reject H₀ — larger tables require more evidence of departure from independence.
C5 — Conditions for the Test
Before computing , you must verify:
Random sample. The data must come from a random sample (or randomized experiment).
All expected frequencies ≥ 5. Compute every value and check that none is below 5. If any , the chi-square approximation is unreliable.
Common error: Checking the observed frequencies instead of the expected frequencies. The condition is on , not . A cell can have O = 3 (small observed count) and still satisfy the condition if .
What to do if conditions are violated: If one or more expected frequencies fall below 5, adjacent categories may be combined to create fewer, larger cells. This merges the small-expected-count cell with a neighbouring one, increasing E. Report the combination as part of your analysis.
C6 — Five-Step Test for Independence
The structure is identical to the framework from INF-5:
Step
Action
Step 1
H₀: The two variables are independent in the population. Hₐ: They are not independent.
Step 2
Check conditions: (1) random sample; (2) all expected frequencies ≥ 5. Compute all E values.
Step 3
Compute . Find .
Step 4
Find the p-value = right-tail area of the distribution. Equivalently, compare to the critical value .
Step 5
If (or ), reject H₀. State the conclusion in context.
Note: The chi-square test is always right-tailed. We reject H₀ only for large values. There is no two-sided or left-sided chi-square test for independence.
C7 — p-Value and Critical Value
Use the Chi-Square Table to find critical values. The table gives — the value that puts area in the right tail.
Chi-Square Distribution Table
Critical values (χ²) for given degrees of freedom (df) and upper tail probability (p).
df \ p
0.995
0.99
0.975
0.95
0.90
0.10
0.05
0.025
0.01
0.005
1
0.000
0.000
0.001
0.004
0.016
2.706
3.841
5.024
6.635
7.879
2
0.010
0.020
0.051
0.103
0.211
4.605
5.991
7.378
9.210
10.597
3
0.072
0.115
0.216
0.352
0.584
6.251
7.815
9.348
11.345
12.838
4
0.207
0.297
0.484
0.711
1.064
7.779
9.488
11.143
13.277
14.860
5
0.412
0.554
0.831
1.145
1.610
9.236
11.070
12.833
15.086
16.750
6
0.676
0.872
1.237
1.635
2.204
10.645
12.592
14.449
16.812
18.548
7
0.989
1.239
1.690
2.167
2.833
12.017
14.067
16.013
18.475
20.278
8
1.344
1.646
2.180
2.733
3.490
13.362
15.507
17.535
20.090
21.955
9
1.735
2.088
2.700
3.325
4.168
14.684
16.919
19.023
21.666
23.589
10
2.156
2.558
3.247
3.940
4.865
15.987
18.307
20.483
23.209
25.188
11
2.603
3.053
3.816
4.575
5.578
17.275
19.675
21.920
24.725
26.757
12
3.074
3.571
4.404
5.226
6.304
18.549
21.026
23.337
26.217
28.300
13
3.565
4.107
5.009
5.892
7.042
19.812
22.362
24.736
27.688
29.819
14
4.075
4.660
5.629
6.571
7.790
21.064
23.685
26.119
29.141
31.319
15
4.601
5.229
6.262
7.261
8.547
22.307
24.996
27.488
30.578
32.801
16
5.142
5.812
6.908
7.962
9.312
23.542
26.296
28.845
32.000
34.267
17
5.697
6.408
7.564
8.672
10.085
24.769
27.587
30.191
33.409
35.718
18
6.265
7.015
8.231
9.390
10.865
25.989
28.869
31.526
34.805
37.156
19
6.844
7.633
8.907
10.117
11.651
27.204
30.144
32.852
36.191
38.582
20
7.434
8.260
9.591
10.851
12.443
28.412
31.410
34.170
37.566
39.997
21
8.034
8.897
10.283
11.591
13.240
29.615
32.671
35.479
38.932
41.401
22
8.643
9.542
10.982
12.338
14.041
30.813
33.924
36.781
40.289
42.796
23
9.260
10.196
11.689
13.091
14.848
32.007
35.172
38.076
41.638
44.181
24
9.886
10.856
12.401
13.848
15.659
33.196
36.415
39.364
42.980
45.559
25
10.520
11.524
13.120
14.611
16.473
34.382
37.652
40.646
44.314
46.928
26
11.160
12.198
13.844
15.379
17.292
35.563
38.885
41.923
45.642
48.290
27
11.808
12.879
14.573
16.151
18.114
36.741
40.113
43.195
46.963
49.645
28
12.461
13.565
15.308
16.928
18.939
37.916
41.337
44.461
48.278
50.993
29
13.121
14.256
16.047
17.708
19.768
39.087
42.557
45.722
49.588
52.336
30
13.787
14.953
16.791
18.493
20.599
40.256
43.773
46.979
50.892
53.672
40
20.707
22.164
24.433
26.509
29.051
51.805
55.758
59.342
63.691
66.766
50
27.991
29.707
32.357
34.764
37.689
63.167
67.505
71.420
76.154
79.490
60
35.534
37.485
40.482
43.188
46.459
74.397
79.082
83.298
88.379
91.952
70
43.275
45.442
48.758
51.739
55.329
85.527
90.531
95.023
100.425
104.215
80
51.172
53.540
57.153
60.391
64.278
96.578
101.879
106.629
112.329
116.321
90
59.196
61.754
65.647
69.126
73.291
107.565
113.145
118.136
124.116
128.299
100
67.328
70.065
74.222
77.929
82.358
118.498
124.342
129.561
135.807
140.169
Decision rule:
Reject H₀ if , equivalently if .
Fail to reject H₀ if .
Common critical values at :
C8 — Conclusion Language
Standard Conclusion Statements
Reject H₀: “There is sufficient evidence at the level that [variable 1] and [variable 2] are not independent in the population.”
Fail to reject H₀: “There is insufficient evidence to conclude that [variable 1] and [variable 2] are not independent in the population.”
Critical error: After failing to reject H₀, do NOT write “the variables are independent.” Failing to reject H₀ only means we lack sufficient evidence of dependence — it does not prove independence. The correct language is “we have insufficient evidence to conclude that the variables are not independent.”
Critical error: Do NOT conclude “one variable causes the other” after rejecting H₀. Chi-square tests association, not causation. Even a highly significant result could reflect a confounding variable rather than a direct causal relationship.
C9 — Cramér’s V: Effect Size
A statistically significant does not tell you how strong the association is. With large sample sizes, even a trivially weak association can produce a significant p-value. Cramér’s V quantifies the strength independently of .
Cramér's V
Range: .
Thresholds (approximate guidelines):
V
Interpretation
Negligible
Small
Medium
Large
Note on : For a table, , so . For a table, , so the formula is the same. The factor only makes a difference for tables where both and .
Common error: Computing — forgetting the square root. Cramér’s V requires the square root to keep V in .
C10 — Association vs. Causation
A statistically significant chi-square test tells you the two variables are associated in the population. It does not tell you:
Which variable (if either) causes the other.
Whether any causal relationship exists at all.
How strong the relationship is in practical terms (use V for that).
A lurking variable — an unmeasured third variable correlated with both variables in your table — could explain the association entirely. This is why association studies require careful contextual interpretation and why controlled experiments are needed to establish causation.
Section 4: Worked Examples
▾
Example 1 — Fully Worked (2×2 Test)
Context: A health researcher surveys 100 adults. Are smoking status and exercise frequency independent at ?
Exercises
Does not exercise
Row total
Smoker
15
35
50
Non-smoker
25
25
50
Col total
40
60
100
Step 1 — Hypotheses:
H₀: Smoking status and exercise frequency are independent in the population.
Hₐ: Smoking status and exercise frequency are not independent in the population.
Step 2 — Expected frequencies:
All . Random sample assumed. Conditions satisfied.
Step 3 — Test statistic ():
Step 4 — p-value / critical value:
From the chi-square table: .
Since , the p-value .
Step 5 — Decision and conclusion:
Reject H₀. There is sufficient evidence at the 0.05 level that smoking status and exercise frequency are not independent in the population.
Cramér’s V:
Small effect — the association is statistically real but modest.
Example 2 — Partially Scaffolded (2×2)
Context: The vaccine/age data from the Introduction (Dataset V1):
Vaccinated
Not vaccinated
Row total
Under 40
45
55
100
40 and over
65
35
100
Col total
110
90
200
Given expected frequencies:, , ,
Your task: compute , check conditions, and state the decision at .
Before seeing the calculations below: based on the observed and expected counts, do you expect the chi-square statistic to be larger or smaller than the critical value (3.841)? Take a moment to guess before continuing.
Show solution
All or . Conditions satisfied. .
. Since → reject H₀ ().
There is sufficient evidence that age group and vaccine uptake are not independent in the population.
— small effect.
Example 3 — Cramér’s V Calculation
Given: A chi-square test yields , , , .
Before seeing the answer: what effect size do you expect — negligible, small, medium, or large?
Show solution
This falls in the medium range (). The caffeine–sleep association is moderately strong.
Example 4 — Find the Error (Conditions Violated)
Context: A researcher uses the following table (n = 50):
Outcome A
Outcome B
Row total
Group 1
3
2
5
Group 2
27
18
45
Col total
30
20
50
The researcher computes (df = 1) and reports “p = 0.667; fail to reject H₀ — the variables appear independent.”
Identify the errors in this analysis.
Show solution
Error 1 — Conditions violated: Computing the expected frequencies:
✗
✗
Two cells have expected frequencies below 5. The chi-square approximation is unreliable here. The test should not be run on this table as-is. The correct approach is to combine categories or use Fisher’s Exact Test.
Error 2 — “fail to reject = independent”: Even if the test were valid, writing “the variables appear independent” after failing to reject H₀ is incorrect. The correct language is “there is insufficient evidence to conclude the variables are not independent.”
Section 5: Guided Practice
▾
Problem 1 — Expected Frequencies (Variant Bank)
Dataset V0 — Smoking status vs. Exercise frequency (n = 100)
Exercises
Does not exercise
Row total
Smoker
15
35
50
Non-smoker
25
25
50
Col total
40
60
100
(a) What is the expected frequency for cell (Smoker, Exercises)?
(b) What is the expected frequency for cell (Non-smoker, Does not exercise)?
Dataset V1 — Age group vs. Vaccine uptake (n = 200)
Vaccinated
Not vaccinated
Row total
Under 40
45
55
100
40 and over
65
35
100
Col total
110
90
200
(a) What is the expected frequency for cell (Under 40, Vaccinated)?
(b) What is the expected frequency for cell (40 and over, Not vaccinated)?
E values: 49.5, 50.5, 49.5, 50.5. All ≥ 5. df = 1.
Fail to reject H₀ (). There is insufficient evidence to conclude that commute length and job satisfaction are not independent.
Problem 2 — Full Five-Step Test on a 2×3 Table with Cramér’s V (Generator)
Problem 3 — Find the Error (Variant Bank)
A researcher runs a chi-square test on a 2×3 contingency table (2 rows, 3 columns, n = 90) and reports: “df = n − 1 = 89, χ² = 8.4. Since p < 0.05, I conclude the variables are not independent.”
What is the error in this statement?
Show Solution
Correct Answer: The correct df is , not .
Explanation: The degrees of freedom for a chi-square test of independence are calculated as where is the number of rows and is the number of columns. The formula is for a single-sample quantitative t-test, not a categorical contingency table.
A researcher computes the test statistic using this formula: , dividing by the observed count instead of the expected count.
What is the error?
Show Solution
Correct Answer: The denominator must be (expected), not (observed).
Explanation: The chi-square test measures how far the observed counts deviate from what we expect under the null hypothesis of independence. Therefore, the standardized squared deviation is scaled relative to the expected count (i.e. rac{(O-E)^2}{E}).
A researcher reports: “The chi-square test gives χ² = 1.8 (df = 1, p = 0.18). We conclude that the two variables are independent.”
What is the error?
Show Solution
Correct Answer: Failing to reject does not prove independence.
Explanation: In hypothesis testing, we never “accept” or “prove” the null hypothesis. We only fail to reject it. A non-significant p-value () indicates that we lack sufficient evidence of a relationship, not that a relationship definitely does not exist.
A researcher finds a significant association between ice cream sales and drowning rates across summer months (χ² = 12.4, df = 2, p < 0.001) and concludes: “Ice cream consumption causes drowning.”
What is the error?
Show Solution
Correct Answer: Chi-square tests association, not causation.
Explanation: A significant chi-square test only establishes that an association exists between the variables. It does not establish a causal relationship. A lurking variable (summer weather/temperature) is the common cause driving both increased ice cream sales and increased swimming/drowning rates.
A researcher runs a chi-square test on a contingency table where one cell has , and reports: “χ² = 9.7, df = 3, p < 0.05. The result is significant — the variables are not independent.”
What is the error?
Problem 4 — Cramér’s V Standalone (Generator)
Problem 5 — Synthesis: Physical Activity and Stress (Non-regenerable)
A study of 130 randomly selected workers classifies each by physical activity level (None / Moderate / Vigorous) and stress level (Low / High).
None
Moderate
Vigorous
Row total
Low stress
15
30
25
70
High stress
25
20
15
60
Col total
40
50
40
130
(a) State H₀ and Hₐ in context.
(b) Compute all 6 expected frequencies and verify the conditions.
(c) Compute , showing all 6 cell contributions.
(d) Identify df and compare to . State the decision at .
(e) Compute Cramér’s V and interpret the practical significance.
(f) A corporate wellness director concludes: “Since the test is significant, we should mandate vigorous exercise for all high-stress employees.” Identify two statistical reasoning errors in this statement.
Show full solution
(a) H₀: Physical activity level and stress level are independent in the worker population. Hₐ: They are not independent.
(b) Expected frequencies:
None
Moderate
Vigorous
Low stress
21.538
26.923
21.538
High stress
18.462
23.077
18.462
Min E = 18.462 ≥ 5. All conditions satisfied. Random sample stated.
(c) computation:
Cell
Low, None
Low, Moderate
Low, Vigorous
High, None
High, Moderate
High, Vigorous
(d). .
Since → Reject H₀ ().
Conclusion: There is sufficient evidence at the 0.05 level that physical activity level and stress level are not independent in the worker population.
(e)
This is a small effect. The association, while statistically real, is weak. Physical activity explains only a modest portion of the variation in stress levels.
(f) Two errors:
Association ≠ causation. A significant chi-square shows that physical activity and stress are associated — it does not establish that requiring vigorous exercise will reduce stress. Other factors (workload, sleep, social support) may drive both variables.
Effect size is small. indicates a weak association. Even if the causal inference were valid, the small effect size suggests that mandating exercise would produce only modest changes in stress outcomes at the population level, and likely not justify a blanket policy.
Mixed Review — Retrieval from Earlier Lessons
These problems draw on concepts from earlier in the course. Attempting them without re-reading prior lessons is the point — retrieval practice strengthens long-term memory more than re-reading.
Review Problem 1 — Z-Test for a Mean (INF-5)
A nutritionist claims that the mean daily sodium intake of adults in a city is 2,300 mg (the recommended daily maximum). A health researcher suspects the true mean is higher. She surveys 50 adults and finds mg with known population standard deviation mg. Use .
(a) State and .
(b) Check conditions for a z-test.
(c) Compute the test statistic .
(d) The critical value for a one-tailed test at is . State the decision and write a conclusion in context.
Show Solution
(a)
(b) Conditions:
Randomness: Assume the 50 adults are a random (or representative) sample. ✓ (stated as a survey)
Independence: 50 adults is plausibly less than 10% of the city’s adult population. ✓
Normality: — by the Central Limit Theorem, the sampling distribution of is approximately normal even without knowing the shape of the population distribution. ✓
is known, so a z-test (not t-test) is appropriate. ✓
(c)
(d) Since , we reject ().
Conclusion: “There is sufficient evidence at the 0.05 significance level that the mean daily sodium intake of adults in this city exceeds 2,300 mg.”
Review Problem 2 — Proportion Test (INF-6)
A college claims that 70% of its graduates find employment in their field within six months. A skeptic surveys 120 recent graduates and finds that 75 are employed in their field. Test whether the true proportion differs from 0.70 at (two-tailed).
(a) State and .
(b) Compute and verify the success-failure condition.
(c) Compute the test statistic. Use — not — in the denominator.
(d) The two-tailed critical value is . State the decision and conclusion.
Show Solution
(a)
(b)
Success-failure condition: ✓ and ✓
(c) The denominator uses (not ) because we are testing under the assumption that is true:
(d) Since , we fail to reject ().
Conclusion: “There is insufficient evidence at the 0.05 significance level to conclude that the true employment rate differs from 70%. The data are consistent with the college’s claim, though a 62.5% observed rate warrants continued monitoring.”
Note: Failing to reject does not prove the rate is 70% — it only means the data did not provide sufficient evidence against it at this sample size.
Section 7: Mastery Check
▾
Feynman Prompt
A colleague says: “We ran a chi-square test and got a highly significant result (). The association between the two variables is therefore very strong.”
In 2–3 sentences, explain what is wrong with this reasoning and what they should report instead. Aim for 150–400 words.
0 words
Show reference answer
A significant p-value tells you the association is real (not due to chance in the sample) — it says nothing about the strength of the association. With large sample sizes, even a negligibly weak association produces a very small p-value. Your colleague should compute Cramér’s V: if V < 0.1, the association is negligible despite the significance; if V ≥ 0.3, it is moderate or large. Always report both p-value and V.
Apply
A researcher surveys 300 university students on preferred study location (Library / Home / Café) and year of study (First year / Upper year). A chi-square test yields with .
(a) What is the decision at ? Use the chi-square table.
(b) What is the correct conclusion?
Analyze the Error
A researcher studying the relationship between pet ownership (Yes/No) and self-reported happiness (Low/High) in a sample of 200 adults reports:
“χ² = 3.2 (df = 1, p = 0.074). Since p > 0.05, we conclude that pet owners are just as happy as non-pet-owners.”
Identify the error in the researcher’s conclusion.
Self-Assessment
How confident do you feel about the chi-square test of independence?
Still confusedReady for the Boss Fight
Section 8: Boss Fight
▾
Choose your path. Both paths test everything from this lesson — pick the one that matches your strengths.
🔬 Path A — The Analyst
Perform a complete five-step chi-square test of independence, compute Cramér’s V, and write a professional research summary.
📝 Path B — The Communicator
Evaluate three brief research reports from different studies, identify the specific errors, and rewrite conclusions using proper statistical language.
Path A — The Analyst
A study of 120 adults measures caffeine intake level (Low / High) and sleep quality (Good / Fair / Poor).
Good sleep
Fair sleep
Poor sleep
Row total
Low caffeine
30
20
10
60
High caffeine
15
20
25
60
Col total
45
40
35
120
Complete the full five-step chi-square test at . Then compute Cramér’s V. Finally, write a one-paragraph research summary using correct statistical language (avoid causation; state the effect size; include the decision).
Show solution
Step 1 — Hypotheses:
H₀: Caffeine intake and sleep quality are independent in the population.
Hₐ: Caffeine intake and sleep quality are not independent in the population.
Step 2 — Expected frequencies:
Good
Fair
Poor
Low caffeine
22.5
20.0
17.5
High caffeine
22.5
20.0
17.5
All E ≥ 5. Conditions satisfied. .
Step 3 — Test statistic:
Step 4 — Critical value:.
Step 5 — Decision: Since , reject H₀ ().
Cramér’s V: — medium effect.
Research summary:
“A chi-square test of independence (χ² = 11.429, df = 2, p = 0.003) found sufficient evidence that caffeine intake level and sleep quality are not independent in the adult population sampled (n = 120). The association is of medium strength (Cramér’s V = 0.309), indicating a practically meaningful — not merely statistically significant — relationship between these variables. Adults with low caffeine intake tended to report better sleep, while those with high caffeine intake were more likely to report poor sleep. As an observational study, causation cannot be established; other factors associated with caffeine intake (e.g., work stress, screen time) may contribute to the observed pattern.”
Reflection: What was the most challenging part of this bivariate chi-square analysis? Was it expected frequency calculations or Cramér’s V interpretation?
Path B — The Communicator
A research team shares three brief reports from different studies. Evaluate each one: identify the error (if any), or confirm the analysis is correct. For each flawed report, rewrite the conclusion using proper statistical language.
Report 1:
“A chi-square test on a 2×2 table (χ² = 6.3, df = 1, n = 180) gives p = 0.012. We conclude there is sufficient evidence at the 5% level that the two categorical variables are not independent in the population. Cramér’s V = 0.19 indicates a small effect.”
Evaluate Report 1
This report is correct. The test conclusion uses proper language (sufficient evidence, not independent, at the 5% level). The Cramér’s V interpretation (small effect) is appropriate for V = 0.19. No causation is claimed. No errors.
Report 2:
“A chi-square test yields χ² = 4.2 (df = 2, p = 0.12). Since p > 0.05, we confirm that education level and political affiliation are independent.”
Evaluate Report 2
Error: “We confirm … are independent” after failing to reject H₀. Failing to reject H₀ does not prove the null hypothesis — it only means insufficient evidence of dependence.
Corrected conclusion: “There is insufficient evidence at the 5% level to conclude that education level and political affiliation are not independent in the population. The test does not confirm independence.”
Report 3:
“Our chi-square analysis shows that ice cream consumption and drowning rates are significantly associated (χ² = 18.7, df = 1, p < 0.001, V = 0.43). Therefore, eating ice cream increases the risk of drowning.”
Evaluate Report 3
Error: Claiming causation from a significant association. Chi-square tests statistical dependence between two variables in a sample — it cannot establish causal direction or rule out confounders. A lurking variable (summer heat) drives both ice cream sales and swimming activity, producing the association.
Corrected conclusion: “There is sufficient evidence at the 0.1% level that ice cream consumption and drowning rates are not independent (χ² = 18.7, df = 1, V = 0.43, medium effect). The association is likely due to a common cause (warm weather driving both behaviours) rather than any direct link between ice cream and drowning.”
Reflection: What was the most challenging part of identifying these errors? Which report was easiest to critique?
Section 9: Challenge Problems
▾
Challenge 1 — Sample Size Effect on χ² vs. Cramér’s V
Dataset V0 (smoking/exercise, n = 100) has the following observed proportions: among smokers, 30% exercise; among non-smokers, 50% exercise (an underlying 20-percentage-point gap).
Fix these proportions and scale the sample size. Complete the table.
n
Observed counts
χ²
Decision ()
V
100
[[15,35],[25,25]]
4.167
Reject H₀
0.204
200
[[30,70],[50,50]]
?
?
?
400
[[60,140],[100,100]]
?
?
?
800
[[120,280],[200,200]]
?
?
?
After completing the table, answer: What does this tell us about using χ² alone as a measure of association strength?
Show solution
n = 200 (proportions unchanged, counts doubled):
E values: each cell E doubles too. E₁₁ = 40, E₁₂ = 60, E₂₁ = 40, E₂₂ = 60.
(Exactly ✓). → Reject H₀. (unchanged).
n = 400:
. Reject H₀. .
n = 800:
. Reject H₀. .
Completed table:
n
χ²
Decision
V
100
4.167
Reject H₀
0.204
200
8.333
Reject H₀
0.204
400
16.667
Reject H₀
0.204
800
33.333
Reject H₀
0.204
Interpretation: χ² scales linearly with n — quadrupling the sample quadruples the test statistic. With a large enough n, even a negligibly weak association will produce a statistically significant χ². Cramér’s V, by contrast, stays constant at 0.204 regardless of n — it measures the true strength of the association, independent of sample size. Always report V alongside χ² to communicate both significance and practical importance.
Challenge 2 — 3×3 Table
A sociologist examines the relationship between preferred car type (Economy / Standard / Luxury) and income bracket (Low / Medium / High) for 300 randomly selected adults.
Low income
Medium income
High income
Row total
Economy
50
30
20
100
Standard
30
40
30
100
Luxury
20
30
50
100
Col total
100
100
100
300
(a) State H₀ and Hₐ.
(b) Compute all 9 expected frequencies and check conditions.
(c) Find df. How does the df formula change compared to a 2×2 table?
(d) Compute χ² (all 9 cell contributions).
(e) Decide at using the chi-square table.
(f) Compute Cramér’s V and interpret.
Show solution
(a) H₀: Car type preference and income bracket are independent in the population. Hₐ: They are not independent.
(b) With equal margins (all row and column totals = 100), every expected frequency is:
. All 9 cells have E = 33.333 ≥ 5. Conditions satisfied.
(c). For a 2×2 table, df = 1. For a 3×3 table, df = 4 — because there are more independent cell comparisons. A larger df requires a larger critical value to reject H₀.
(d) Each cell contributes :
Cell
O
(O − 33.333)²/33.333
Economy, Low
50
8.333
Economy, Medium
30
0.333
Economy, High
20
5.333
Standard, Low
30
0.333
Standard, Medium
40
1.333
Standard, High
30
0.333
Luxury, Low
20
5.333
Luxury, Medium
30
0.333
Luxury, High
50
8.333
(e). Since , reject H₀ strongly.
There is very strong evidence that car preference and income bracket are not independent in the population.
(f)
— small to medium effect.
Despite the extremely significant p-value, the association is only small in practical terms. Knowing income bracket provides only limited ability to predict car type preference.
Challenge 3 — Simpson’s Paradox
A hospital compares recovery rates for two treatments (A vs. B). The overall combined data appears to show Treatment A is superior:
Combined data (n = 200):
Recovered
Not recovered
Total
Treatment A
78
22
100
Treatment B
73
27
100
(a) Based on the combined table, which treatment appears better?
After stratifying by condition severity:
Mild cases (n = 130):
Recovered
Not recovered
Total
Treatment A
72
8
80
Treatment B
45
5
50
Severe cases (n = 70):
Recovered
Not recovered
Total
Treatment A
6
14
20
Treatment B
28
22
50
(b) For mild cases, which treatment is better (or are they tied)?
(c) For severe cases, which treatment is better?
(d) Why does Treatment A appear better overall even though it is no better (or worse) for each severity group? What lurking variable drives the reversal?
(e) If a hospital administrator used only the combined table chi-square result to recommend a treatment, what error would they be making?
(f) What general lesson does this illustrate about chi-square tests and association analyses?
Show solution
(a) Treatment A: 78/100 = 78%. Treatment B: 73/100 = 73%. Treatment A appears better in the combined data.
(b) Mild cases: A = 72/80 = 90.0%; B = 45/50 = 90.0%. Exactly tied.
(c) Severe cases: A = 6/20 = 30.0%; B = 28/50 = 56.0%. Treatment B is better.
(d) Treatment A was assigned mostly to mild cases (80 of its 100 patients), which naturally have higher recovery rates regardless of treatment. Treatment B handled more severe cases (50 of its 100 patients). The overall rate for Treatment A is inflated by its patient mix. Disease severity is a lurking variable (confounder) — it is associated with both treatment assignment and recovery outcome, creating the illusion that A outperforms B.
(e) The administrator would recommend Treatment A based on a confounded combined analysis. The correct conclusion — that B is equally effective for mild cases and superior for severe cases — is only visible after stratifying by severity.
(f) This is Simpson’s Paradox: an association present in the combined data can reverse or disappear after stratifying by a third variable. A chi-square test on a combined table may mask patterns present within subgroups. Whenever a lurking variable might confound the association, the data should be stratified by that variable. Statistical analysis must always account for the study design and potential confounders.
Section 10: Solutions Reference
▾
Full worked solutions for all problems in this lesson (Sections 5–9) are available on the dedicated solutions page. Solutions include every computation step, formula derivation, and interpretation note.
Expected frequencies: Compute every before running the test. All must be ≥ 5; if any are not, combine adjacent categories.
Decision rule: Reject H₀ if (equivalently, if ). Fail to reject otherwise.
“Fail to reject”: Does not mean the variables are independent. Write “insufficient evidence to conclude the variables are not independent.”
Conclusion language: Always name both variables and the significance level: “There is sufficient evidence at the 0.05 level that [variable 1] and [variable 2] are not independent in the population.”
Cramér’s V: V < 0.1 negligible · 0.1–0.3 small · 0.3–0.5 medium · ≥ 0.5 large. Report alongside the p-value.
Causation: A significant establishes association only — never causal direction.
Common Pitfalls
Pitfall
What goes wrong
Correction
P1 — Conditions skipped
Test run without checking
Compute all expected frequencies first; combine categories if needed
P2 — O and E swapped
Writing or in the numerator
Denominator is always ; numerator is
P3 — Wrong df
Using or
Always
P4 — Causation error
”Variable A causes Variable B” after a significant result
Chi-square shows association, not causation
P5 — Independence claimed
”The variables are independent” after failing to reject