EN FR

REG-4 Solutions: Chi-Square Test of Independence

Solutions Reference · ← Back to Lesson REG-4

Section 4 — Worked Examples Solutions

Example 1 — Fully Worked 2×2 Test

See lesson text — the complete five-step solution is presented directly in the lesson.


Example 2 — Partially Scaffolded (Age/Vaccination)

E values: 55.0, 45.0, 55.0, 45.0. All ≥ 5. df = 1.

\[\chi^2 = \frac{(45-55)^2}{55} + \frac{(55-45)^2}{45} + \frac{(65-55)^2}{55} + \frac{(35-45)^2}{45} = 1.818 + 2.222 + 1.818 + 2.222 = 8.081\]

\(\chi^2_0.05(1) = 3.841\). Since \(8.081 > 3.841\), reject H₀ (\(p \approx 0.004\)).

Conclusion: There is sufficient evidence at the 0.05 level that age group and vaccine uptake are not independent in the population.

\(V = \sqrt{8.081/200} = 0.201\) — small effect.


Example 3 — Cramér's V

\(k = \min(1, 2) = 1\)

\(V = \sqrt{11.429 / (120 \times 1)} = \sqrt{0.09524} = 0.309\) — medium effect.


Example 4 — Find the Error (Conditions Violated)

Error 1: \(E_11 = 3.0\) < 5 and \(E_12 = 2.0\) < 5 — two cells violate the expected-frequency condition. The chi-square approximation is unreliable.

Error 2: "Variables appear independent" after failing to reject H₀ — the correct statement is "insufficient evidence to conclude non-independence."

Section 5 — Guided Practice Solutions

Problem 1 — Expected Frequencies

Variant 0 (Smoking/Exercise):

(a) \(E = 50 \times 40 / 100 = 20.0\)   (b) \(E = 50 \times 60 / 100 = 30.0\)

Variant 1 (Age/Vaccination):

(a) \(E = 100 \times 110 / 200 = 55.0\)   (b) \(E = 100 \times 90 / 200 = 45.0\)

Variant 2 (Education/Newspaper):

(a) \(E = 100 \times 70 / 200 = 35.0\)   (b) \(E = 100 \times 70 / 200 = 35.0\)

Variant 3 (Stress/Sleep):

(a) \(E = 80 \times 70 / 140 = 40.0\)   (b) \(E = 60 \times 70 / 140 = 30.0\)

Variant 4 (Diet/BMI):

(a) \(E = 50 \times 60 / 100 = 30.0\)   (b) \(E = 50 \times 40 / 100 = 20.0\)


Problem 2 — Conditions and df

VariantMin EConditionsdf
0 (Smoking/Exercise, 2×2)20.0Yes — all E ≥ 51
1 (Conditions violated, 2×2)2.0No — E₁₂ = 2.0 < 51
2 (Gender/Transport, 2×3)12.5Yes2
3 (Conditions violated, 2×3)4.2No — E₁₁ = E₁₃ = 4.2 < 52
4 (3×2 table)15.0Yes2

Problem 3 — Reading the Chi-Square Table

(a) \(\chi^2_0.05(1) = 3.841\)

(b) \(\chi^2_0.05(2) = 5.991\)

(c) \(\chi^2 = 6.8\) > 5.991 = \(\chi^2_0.05(2)\) → reject H₀.


Problem 4 — Full Five-Step Test (Generator)

Solutions vary by dataset selected. The generator's "Show Solution" button displays the complete step-by-step answer including all E values, χ² computation, and decision.

Section 6 — Independent Practice Solutions

Problem 1 — Full Analysis Chain

Variant 0 (Smoking/Exercise, reject):

\(\chi^2 = 4.167\), df = 1, reject H₀ (\(4.167 > 3.841\)). There is sufficient evidence that smoking status and exercise frequency are not independent. \(V = 0.204\) (small).

Variant 1 (Age/Vaccination, reject):

\(\chi^2 = 8.081\), df = 1, reject H₀. There is sufficient evidence that age group and vaccine uptake are not independent. \(V = 0.201\) (small).

Variant 2 (Education/Newspaper, fail to reject):

\(\chi^2 = 2.197\), df = 1, fail to reject H₀ (2.197 < 3.841). There is insufficient evidence to conclude that education level and daily newspaper reading are not independent.

Variant 3 (Stress/Sleep, reject):

\(\chi^2 = 11.667\), df = 1, reject H₀. There is sufficient evidence that stress level and sleep quality are not independent. \(V = 0.289\) (small–medium).

Variant 4 (Commute/Satisfaction, fail to reject):

\(\chi^2 = 2.420\), df = 1, fail to reject H₀. There is insufficient evidence to conclude that commute length and job satisfaction are not independent.


Problem 2 — Full 2×3 Test with Cramér's V (Generator)

Solutions vary by dataset. Generator solutions cover V6–V9 (chi² = 3.556, 16.333, 11.429, 2.052 respectively).


Problem 3 — Find the Error

VariantErrorCorrect approach
0 (Wrong df)Used df = n−1 = 89 instead of (r−1)(c−1) = 2df = (2−1)(3−1) = 2 for a 2×3 table
1 (O vs. E)Divided by O instead of E in the χ² formulaDenominator must be E — the reference under H₀
2 (Fail to reject = independent)"Variables are independent" after failing to reject H₀"Insufficient evidence to conclude non-independence"
3 (Causation)Concluded ice cream causes drowning from a significant χ²Lurking variable (summer heat) drives both; chi-square shows association only
4 (Conditions violated)Ran test with E = 2.3 < 5; conditions violatedCombine categories or use exact test; p-value is unreliable

Problem 4 — Cramér's V (Generator)

Solutions vary by pair selected. Key formula: \(V = \sqrt{\chi^2 / (n \cdot k)}\), where \(k = \min(r-1, c-1)\). For all W0–W6 pairs, k = 1.


Problem 5 — Physical Activity vs. Stress (Synthesis)

(a) H₀: Physical activity and stress level are independent in the worker population. Hₐ: Not independent.

(b) Expected frequencies (all ≥ 5 ✓):

NoneModerateVigorous
Low stress21.53826.92321.538
High stress18.46223.07718.462

(c) \(\chi^2 = 1.985 + 0.352 + 0.556 + 2.316 + 0.410 + 0.649 = 6.268\)

(d) df = 2. \(\chi^2_0.05(2) = 5.991\). Since 6.268 > 5.991 → reject H₀ (\(p \approx 0.043\)).

(e) \(V = \sqrt{6.268/130} = 0.220\) — small effect. Statistically real but weak.

(f) Error 1: Chi-square shows association, not causation — mandating exercise may not reduce stress. Error 2: \(V = 0.22\) is small; even if causal, the effect is modest and unlikely to justify a blanket policy.

Section 7 — Mastery Check Solutions

Feynman Prompt

A significant p-value means the association is unlikely to be due to chance in the sample — not that it is strong. With large n, even trivially weak associations become statistically significant. The colleague should compute Cramér's V: if V < 0.1, the association is negligible despite the significance; if V ≥ 0.3, it is at least moderate.

Apply (Study location / Year of study)

(a) \(\chi^2 = 9.2\) > 5.991 = \(\chi^2_0.05(2)\) → reject H₀.

(b) There is sufficient evidence at the 0.05 level that preferred study location and year of study are not independent in the population.

Analyze the Error

The researcher wrote "pet owners are just as happy as non-pet-owners" — this confuses failing to reject H₀ with proving independence. Correct: "There is insufficient evidence to conclude that pet ownership and happiness are not independent in the population."

Section 8 — Boss Fight Solutions

Path A — The Analyst (Caffeine/Sleep)

Expected frequencies: All cells E = 22.5, 20.0, or 17.5 — all ≥ 5. df = 2.

\[\chi^2 = 2.500 + 0 + 3.214 + 2.500 + 0 + 3.214 = 11.429\]

\(\chi^2_0.05(2) = 5.991\). Since 11.429 > 5.991 → reject H₀ (\(p \approx 0.003\)).

\(V = \sqrt{11.429/120} = 0.309\) — medium effect.

Research summary: See lesson — one-paragraph summary using correct language (no causation, includes V, states decision).

Path B — The Communicator

Section 9 — Challenge Solutions

Challenge 1 — Sample Size Effect

nχ²DecisionV
1004.167Reject H₀0.204
2008.333Reject H₀0.204
40016.667Reject H₀0.204
80033.333Reject H₀0.204

Lesson: χ² scales with n. V stays constant. With large n, even trivial associations become highly significant. Always report V alongside χ².


Challenge 2 — 3×3 Table

All E = 33.333 ≥ 5. df = (3−1)(3−1) = 4.

\(\chi^2 = 30.000\). \(\chi^2_0.05(4) = 9.488\). Reject H₀ strongly.

\(k = 2\), \(V = \sqrt{30/(300 \times 2)} = \sqrt{0.050} = 0.224\) — small to medium effect despite very significant p-value.


Challenge 3 — Simpson's Paradox

(a) Treatment A: 78% vs. B: 73% — A appears better overall.

(b) Mild: both 90%. Tied.

(c) Severe: A = 30%, B = 56%. Treatment B is better.

(d) Treatment A was assigned to mostly mild cases (80/100), which have inherently higher recovery. Disease severity is the lurking variable (confounder).

(e) Recommending A based on combined data ignores the confounding by severity — Treatment B is at least as good for mild cases and clearly better for severe cases.

(f) Simpson's Paradox: a combined-table association can mask or reverse stratum-specific patterns. Always stratify by potential confounders before drawing conclusions.