Chi-Square Test of Independence

A public health team collects vaccination data from 200 adults and organizes it into a simple table:

	Vaccinated	Not vaccinated	Row total
Under 40	45	55	100
40 and over	65	35	100
Col total	110	90	200

Older adults appear more likely to be vaccinated — 65% versus 45%. But before recommending any policy, the team needs to answer a harder question: Are older adults genuinely more likely to be vaccinated in the population, or could a difference this large arise from chance in a random sample of 200?

The proportions are different. But are they statistically different? This is exactly the question the chi-square test of independence answers.

The test works by comparing what we observed to what we would expect if the two variables were completely unrelated — if vaccination status and age group were independent in the population. A large discrepancy between observed and expected counts produces a large test statistic; a small discrepancy produces a small one. The p-value then tells us whether that discrepancy is surprising enough to reject independence.

By the end of this lesson, you will be able to:

Construct a contingency table and compute expected frequencies under the null hypothesis of independence.
Compute the chi-square test statistic and determine degrees of freedom.
Check the conditions for the chi-square test and explain what to do when they are violated.
Perform the complete five-step test, using the chi-square table to find critical values.
Compute Cramér’s V to quantify the strength of association.
Distinguish statistical association from causation.

This lesson builds directly on the five-step hypothesis-testing framework from inf-5. Make sure you have these tools ready.

The five-step framework (from inf-5):

Step	What you do
1	State H₀ and Hₐ
2	Check conditions
3	Compute the test statistic
4	Find the p-value (or compare to critical value)
5	State the decision and conclusion in context

P-value decision rule: Reject H₀ if . At , this means we need the data to be sufficiently unlikely under the assumption of independence.

Fail to reject ≠ accept H₀. A large p-value means “not enough evidence to conclude the variables are dependent” — it does not prove they are independent. We cannot prove the null hypothesis.

Type I and Type II errors:

Type I: Rejecting H₀ when it is actually true (false positive). Probability = .
Type II: Failing to reject H₀ when it is actually false (false negative). Probability = .

Key difference from inf-5: The chi-square test for independence is always right-tailed only. Unlike the t-test for a mean (which can be two-tailed), the chi-square statistic is always ≥ 0, and a large positive value is the only kind of extreme outcome. We only look at the right tail of the chi-square distribution.

Retrieval Warm-up — from earlier lessons

In a study of 40 university students, a researcher tests whether mean daily screen time differs from a national benchmark of 6.5 hours. She collects a sample with h and h. She has no prior hypothesis about the direction of difference. Which null and alternative hypotheses are correct?

A researcher uses inf-6 to test whether the proportion of patients who experience side effects differs from 0.20. She obtains from a sample of . She computes the test statistic as . What error has she made?

C1 — Contingency Table Structure

A contingency table (also called a cross-tabulation or two-way table) organizes data on two categorical variables. Rows represent levels of one variable; columns represent levels of the other. Cells contain observed frequencies (O) — the actual counts from the sample.

Contingency Table

A two-way table with rows and columns. Each cell contains the observed frequency O for that combination of categories. The row totals, column totals, and grand total form the margins.

Example — 3×2 contingency table structure (r = 3, c = 2):

	Category A	Category B	Row total
Group 1
Group 2
Group 3
Col total

A mosaic plot turns a contingency table into a single picture: column widths show how common each column category is, and the tile heights within each column show the conditional proportions of the rows. When the two variables are independent, every column splits at the same height — the tiles line up with the dashed reference lines. The further the tiles drift apart, the stronger the association. (The χ² and Cramér’s V values shown are defined later in this section and in C9 — for now, just watch the tiles.)

Mosaic plot of a contingency table

Figure: A mosaic plot turns the whole table into one picture. Column widths show how common each column category is; tile heights show the conditional row proportions within each column. Under independence every column splits at the same height — so the tiles line up with the dashed independence reference lines. The further the tiles drift from those lines, the stronger the association (and the larger χ² and Cramér's V). Step from Education (weak) to Stress (strong) to watch the tiles pull apart.

C2 — Expected Frequencies

Expected Frequency

The count we would predict in a cell if the two variables were completely independent in the population. For any cell in row , column :

Common error: Some students compute — adding instead of multiplying. This is wrong. The formula uses multiplication, not addition, because independence means the joint probability equals the product of marginal probabilities.

Common error: Some students write — dividing by twice. The correct formula divides by once.

The area model below shows why the formula multiplies. The grand total is a square; the column marginals slice it one way and the row marginals slice it the other way. Because the two slices are perpendicular, each cell’s expected count is the product of the two shares — there is no way to “add” two perpendicular cuts. Select any cell to see its computation, and toggle the observed counts to see how the real data departs from this independence model.

Expected-frequency area builder

Figure: Each rectangle's area is its expected frequency under independence. The two marginal splits are perpendicular, so they multiply — that is why E = (row total × column total) ÷ n, and why adding the marginals (a common error) is geometrically meaningless. Click any cell to see its computation. Toggle Show observed counts to tint each cell by how far the observed count departs from what independence predicts — the raw material of the χ² statistic.

Verifying expected frequencies: The row totals and column totals of the expected frequency table must equal the row and column totals of the observed table. This is a useful self-check.

C3 — Chi-Square Test Statistic

Chi-Square Test Statistic

Summed over all cells. Measures the total discrepancy between observed and expected counts. always; equality holds only when O = E in every single cell.

Intuition: Each cell contributes to the sum — the squared deviation from what independence predicts, scaled by the expected count (larger cells shouldn’t dominate just because they’re larger). A large means the observed data deviates strongly from what independence would predict.

Common error: Using (forgetting to square the numerator) or (dividing by observed instead of expected). The denominator must be — it is the reference under H₀.

You may see a slightly different value elsewhere. For tables (), some textbooks and software apply Yates’ continuity correction, which subtracts from each before squaring. It makes the test more conservative (a smaller ). Modern practice often considers it overly cautious, and this lesson uses the uncorrected formula throughout — but don’t be surprised if a calculator reports a marginally different for the same table.

Drag the association strength slider above. At zero, every observed bar sits exactly on the dashed expected line ( in all four cells) and . As you increase the strength, the bars pull away from the expected line — the shaded gap on each bar is — and climbs as the squared version of those gaps. The decision flips to reject the moment passes the critical value. (This symmetric table is built so the slider equals Cramér’s V exactly.)

The heatmap below breaks the same statistic apart cell by cell. Each cell is shaded by how much it contributes — — and the stacked bar adds those contributions up against the critical value. Click cells to see where the association actually lives: notice that a cell can be large in raw count yet contribute almost nothing if its observed count is close to what independence predicts.

Chi-square contribution heatmap

Fail to Reject H₀

Figure: Colour intensity = each cell's contribution (O − E)² / E to χ². The stacked bar on the right adds those contributions up; when the stack clears the dashed critical value, the test is significant. Click a cell to see its computation and its share of the total. Notice that a large cell where O is close to E stays pale and adds almost nothing — size alone never drives χ². In the symmetric 3×3 example, the association lives entirely in the four corners.

C4 — Degrees of Freedom

Degrees of Freedom for Chi-Square Test of Independence

where = number of rows, = number of columns.

Examples:

table:
table:
table:

Common error: Using (borrowed from the t-test for a mean) or . For the chi-square test of independence, the correct formula is always .

Higher df means a larger critical value is needed to reject H₀ — larger tables require more evidence of departure from independence.

C5 — Conditions for the Test

Before computing , you must verify:

Random sample. The data must come from a random sample (or randomized experiment).
All expected frequencies ≥ 5. Compute every value and check that none is below 5. If any , the chi-square approximation is unreliable.

Common error: Checking the observed frequencies instead of the expected frequencies. The condition is on , not . A cell can have O = 3 (small observed count) and still satisfy the condition if .

What to do if conditions are violated: You have two main options. (1) Combine adjacent categories to create fewer, larger cells — this merges a small-expected-count cell with a neighbour, increasing E. Only do this when the categories are genuinely combinable (e.g., ordinal levels), since merging changes the hypothesis being tested; report the combination as part of your analysis. (2) Use an exact test — most commonly Fisher’s Exact Test for a table — which computes the p-value directly without relying on the chi-square approximation.

Beyond the strict rule: “all ” is the conservative rule taught here. Cochran’s more permissive guideline allows the test when no cell has and at most 20% of cells have . We use the strict version as a safe default.

C6 — Five-Step Test for Independence

The structure is identical to the framework from inf-5:

Step	Action
Step 1	H₀: The two variables are independent in the population. Hₐ: They are not independent.
Step 2	Check conditions: (1) random sample; (2) all expected frequencies ≥ 5. Compute all E values.
Step 3	Compute . Find .
Step 4	Find the p-value = right-tail area of the distribution. Equivalently, compare to the critical value .
Step 5	If (or ), reject H₀. State the conclusion in context.

Note: The chi-square test is always right-tailed. We reject H₀ only for large values. There is no two-sided or left-sided chi-square test for independence.

Independence vs. homogeneity — same arithmetic, different framing. When both classifications are random — you draw one sample and cross-classify it on two variables — this is a test of independence (“are the two variables independent in the population?”). When the group sizes are fixed by design — you deliberately sample, say, 100 adults under 40 and 100 aged 40+ — it is technically a test of homogeneity (“is the distribution of vaccine uptake the same across age groups?”). The expected-frequency formula, the statistic, the df, and the decision rule are identical; only the wording of the hypotheses and conclusion changes. Several tables in this lesson have equal, round group totals (a hint that the rows were fixed by design) — read those as homogeneity tests if you want to be precise.

C7 — p-Value and Critical Value

Use the Chi-Square Table to find critical values. The table gives — the value that puts area in the right tail.

Chi-Square Distribution Table

Critical values (χ²) for given degrees of freedom (df) and upper tail probability (p).

df \ p	0.995	0.99	0.975	0.95	0.90	0.10	0.05	0.025	0.01	0.005
1	0.000	0.000	0.001	0.004	0.016	2.706	3.841	5.024	6.635	7.879
2	0.010	0.020	0.051	0.103	0.211	4.605	5.991	7.378	9.210	10.597
3	0.072	0.115	0.216	0.352	0.584	6.251	7.815	9.348	11.345	12.838
4	0.207	0.297	0.484	0.711	1.064	7.779	9.488	11.143	13.277	14.860
5	0.412	0.554	0.831	1.145	1.610	9.236	11.070	12.833	15.086	16.750
6	0.676	0.872	1.237	1.635	2.204	10.645	12.592	14.449	16.812	18.548
7	0.989	1.239	1.690	2.167	2.833	12.017	14.067	16.013	18.475	20.278
8	1.344	1.646	2.180	2.733	3.490	13.362	15.507	17.535	20.090	21.955
9	1.735	2.088	2.700	3.325	4.168	14.684	16.919	19.023	21.666	23.589
10	2.156	2.558	3.247	3.940	4.865	15.987	18.307	20.483	23.209	25.188
11	2.603	3.053	3.816	4.575	5.578	17.275	19.675	21.920	24.725	26.757
12	3.074	3.571	4.404	5.226	6.304	18.549	21.026	23.337	26.217	28.300
13	3.565	4.107	5.009	5.892	7.042	19.812	22.362	24.736	27.688	29.819
14	4.075	4.660	5.629	6.571	7.790	21.064	23.685	26.119	29.141	31.319
15	4.601	5.229	6.262	7.261	8.547	22.307	24.996	27.488	30.578	32.801
16	5.142	5.812	6.908	7.962	9.312	23.542	26.296	28.845	32.000	34.267
17	5.697	6.408	7.564	8.672	10.085	24.769	27.587	30.191	33.409	35.718
18	6.265	7.015	8.231	9.390	10.865	25.989	28.869	31.526	34.805	37.156
19	6.844	7.633	8.907	10.117	11.651	27.204	30.144	32.852	36.191	38.582
20	7.434	8.260	9.591	10.851	12.443	28.412	31.410	34.170	37.566	39.997
21	8.034	8.897	10.283	11.591	13.240	29.615	32.671	35.479	38.932	41.401
22	8.643	9.542	10.982	12.338	14.041	30.813	33.924	36.781	40.289	42.796
23	9.260	10.196	11.689	13.091	14.848	32.007	35.172	38.076	41.638	44.181
24	9.886	10.856	12.401	13.848	15.659	33.196	36.415	39.364	42.980	45.559
25	10.520	11.524	13.120	14.611	16.473	34.382	37.652	40.646	44.314	46.928
26	11.160	12.198	13.844	15.379	17.292	35.563	38.885	41.923	45.642	48.290
27	11.808	12.879	14.573	16.151	18.114	36.741	40.113	43.195	46.963	49.645
28	12.461	13.565	15.308	16.928	18.939	37.916	41.337	44.461	48.278	50.993
29	13.121	14.256	16.047	17.708	19.768	39.087	42.557	45.722	49.588	52.336
30	13.787	14.953	16.791	18.493	20.599	40.256	43.773	46.979	50.892	53.672
40	20.707	22.164	24.433	26.509	29.051	51.805	55.758	59.342	63.691	66.766
50	27.991	29.707	32.357	34.764	37.689	63.167	67.505	71.420	76.154	79.490
60	35.534	37.485	40.482	43.188	46.459	74.397	79.082	83.298	88.379	91.952
70	43.275	45.442	48.758	51.739	55.329	85.527	90.531	95.023	100.425	104.215
80	51.172	53.540	57.153	60.391	64.278	96.578	101.879	106.629	112.329	116.321
90	59.196	61.754	65.647	69.126	73.291	107.565	113.145	118.136	124.116	128.299
100	67.328	70.065	74.222	77.929	82.358	118.498	124.342	129.561	135.807	140.169

Decision rule:

Reject H₀ if , equivalently if .
Fail to reject H₀ if .

Common critical values at :

The explorer below puts those numbers on the curve they come from. Drag the df slider to watch the distribution reshape and the critical value grow; drag the observed χ² marker to see the p-value as the shaded right-tail area. Notice there is only ever one tail — the chi-square test is right-tailed by construction.

Chi-square distribution explorer

Degrees of freedom (df) = 1

Observed χ² = 4.17

Reject H₀

Figure: The chi-square distribution is right-skewed and lives only on positive values, so the test is always right-tailed — large χ² is the only kind of extreme outcome. Drag the df slider to watch the curve reshape and the dashed critical value move (it grows with df — larger tables demand more evidence). Drag the observed χ² marker: the shaded area to its right is the p-value, and the decision flips the moment it crosses the dashed critical line.

C8 — Conclusion Language

Standard Conclusion Statements

Reject H₀: “There is sufficient evidence at the level that [variable 1] and [variable 2] are not independent in the population.”

Fail to reject H₀: “There is insufficient evidence to conclude that [variable 1] and [variable 2] are not independent in the population.”

Critical error: After failing to reject H₀, do NOT write “the variables are independent.” Failing to reject H₀ only means we lack sufficient evidence of dependence — it does not prove independence. The correct language is “we have insufficient evidence to conclude that the variables are not independent.”

Critical error: Do NOT conclude “one variable causes the other” after rejecting H₀. Chi-square tests association, not causation. Even a highly significant result could reflect a confounding variable rather than a direct causal relationship.

C9 — Cramér’s V: Effect Size

A statistically significant does not tell you how strong the association is. With large sample sizes, even a trivially weak association can produce a significant p-value. Cramér’s V quantifies the strength independently of .

Cramér's V

Range: .

Thresholds (approximate guidelines):

V	Interpretation
	Negligible
	Small
	Medium
	Large

Note on : For a table, , so . For a table, , so the formula is the same. The factor only makes a difference for tables where both and .

A caveat on the thresholds: the cutoffs above (Cohen’s guidelines) are calibrated for tables where — that is, and (or ) tables. For larger tables the boundaries shift downward by a factor of . For a table (), for example, the small/medium/large boundaries become roughly . In this lesson we use the fixed table as the default; just remember it is approximate and shape-dependent.

Common error: Computing — forgetting the square root. Cramér’s V requires the square root to keep V in .

C10 — Association vs. Causation

A statistically significant chi-square test tells you the two variables are associated in the population. It does not tell you:

Which variable (if either) causes the other.
Whether any causal relationship exists at all.
How strong the relationship is in practical terms (use V for that).

A lurking variable — an unmeasured third variable correlated with both variables in your table — could explain the association entirely. This is why association studies require careful contextual interpretation and why controlled experiments are needed to establish causation.

Example 1 — Fully Worked (2×2 Test)

Context: A health researcher surveys 100 adults. Are smoking status and exercise frequency independent at ?

	Exercises	Does not exercise	Row total
Smoker	15	35	50
Non-smoker	25	25	50
Col total	40	60	100

Step 1 — Hypotheses:

H₀: Smoking status and exercise frequency are independent in the population.
Hₐ: Smoking status and exercise frequency are not independent in the population.

Step 2 — Expected frequencies:

All . Random sample assumed. Conditions satisfied.

Step 3 — Test statistic ():

Step 4 — p-value / critical value:

From the chi-square table: .

Since , the p-value .

Step 5 — Decision and conclusion:

Reject H₀. There is sufficient evidence at the 0.05 level that smoking status and exercise frequency are not independent in the population.

Cramér’s V:

Small effect — the association is statistically real but modest.

Example 2 — Partially Scaffolded (2×2)

Context: The vaccine/age data from the Introduction (Dataset V1):

	Vaccinated	Not vaccinated	Row total
Under 40	45	55	100
40 and over	65	35	100
Col total	110	90	200

Given expected frequencies: , , ,

Your task: compute , check conditions, and state the decision at .

Before seeing the calculations below: based on the observed and expected counts, do you expect the chi-square statistic to be larger or smaller than the critical value (3.841)? Take a moment to guess before continuing.

Show solution

All or . Conditions satisfied. .

. Since → reject H₀ ().

There is sufficient evidence that age group and vaccine uptake are not independent in the population.

— small effect.

Example 3 — Cramér’s V Calculation

Given: A chi-square test yields , , , .

Before seeing the answer: what effect size do you expect — negligible, small, medium, or large?

Show solution

This falls in the medium range (). The association is moderately strong.

Example 4 — Find the Error (Conditions Violated)

Context: A researcher uses the following table (n = 50):

	Outcome A	Outcome B	Row total
Group 1	3	2	5
Group 2	27	18	45
Col total	30	20	50

The researcher computes (df = 1) and reports “p = 0.667; fail to reject H₀ — the variables appear independent.”

Identify the errors in this analysis.

Show solution

Error 1 — Conditions violated: Computing the expected frequencies:

Two cells have expected frequencies below 5. The chi-square approximation is unreliable here. The test should not be run on this table as-is. The correct approach is to combine categories or use Fisher’s Exact Test.

Error 2 — “fail to reject = independent”: Even if the test were valid, writing “the variables appear independent” after failing to reject H₀ is incorrect. The correct language is “there is insufficient evidence to conclude the variables are not independent.”

Problem 1 — Expected Frequencies

Problem 2 — Conditions Check and Degrees of Freedom

Problem 3 — Reading the Chi-Square Table

Use the embedded chi-square table to answer each question.

Chi-Square Distribution Table

Critical values (χ²) for given degrees of freedom (df) and upper tail probability (p).

df \ p	0.995	0.99	0.975	0.95	0.90	0.10	0.05	0.025	0.01	0.005
1	0.000	0.000	0.001	0.004	0.016	2.706	3.841	5.024	6.635	7.879
2	0.010	0.020	0.051	0.103	0.211	4.605	5.991	7.378	9.210	10.597
3	0.072	0.115	0.216	0.352	0.584	6.251	7.815	9.348	11.345	12.838
4	0.207	0.297	0.484	0.711	1.064	7.779	9.488	11.143	13.277	14.860
5	0.412	0.554	0.831	1.145	1.610	9.236	11.070	12.833	15.086	16.750
6	0.676	0.872	1.237	1.635	2.204	10.645	12.592	14.449	16.812	18.548
7	0.989	1.239	1.690	2.167	2.833	12.017	14.067	16.013	18.475	20.278
8	1.344	1.646	2.180	2.733	3.490	13.362	15.507	17.535	20.090	21.955
9	1.735	2.088	2.700	3.325	4.168	14.684	16.919	19.023	21.666	23.589
10	2.156	2.558	3.247	3.940	4.865	15.987	18.307	20.483	23.209	25.188
11	2.603	3.053	3.816	4.575	5.578	17.275	19.675	21.920	24.725	26.757
12	3.074	3.571	4.404	5.226	6.304	18.549	21.026	23.337	26.217	28.300
13	3.565	4.107	5.009	5.892	7.042	19.812	22.362	24.736	27.688	29.819
14	4.075	4.660	5.629	6.571	7.790	21.064	23.685	26.119	29.141	31.319
15	4.601	5.229	6.262	7.261	8.547	22.307	24.996	27.488	30.578	32.801
16	5.142	5.812	6.908	7.962	9.312	23.542	26.296	28.845	32.000	34.267
17	5.697	6.408	7.564	8.672	10.085	24.769	27.587	30.191	33.409	35.718
18	6.265	7.015	8.231	9.390	10.865	25.989	28.869	31.526	34.805	37.156
19	6.844	7.633	8.907	10.117	11.651	27.204	30.144	32.852	36.191	38.582
20	7.434	8.260	9.591	10.851	12.443	28.412	31.410	34.170	37.566	39.997
21	8.034	8.897	10.283	11.591	13.240	29.615	32.671	35.479	38.932	41.401
22	8.643	9.542	10.982	12.338	14.041	30.813	33.924	36.781	40.289	42.796
23	9.260	10.196	11.689	13.091	14.848	32.007	35.172	38.076	41.638	44.181
24	9.886	10.856	12.401	13.848	15.659	33.196	36.415	39.364	42.980	45.559
25	10.520	11.524	13.120	14.611	16.473	34.382	37.652	40.646	44.314	46.928
26	11.160	12.198	13.844	15.379	17.292	35.563	38.885	41.923	45.642	48.290
27	11.808	12.879	14.573	16.151	18.114	36.741	40.113	43.195	46.963	49.645
28	12.461	13.565	15.308	16.928	18.939	37.916	41.337	44.461	48.278	50.993
29	13.121	14.256	16.047	17.708	19.768	39.087	42.557	45.722	49.588	52.336
30	13.787	14.953	16.791	18.493	20.599	40.256	43.773	46.979	50.892	53.672
40	20.707	22.164	24.433	26.509	29.051	51.805	55.758	59.342	63.691	66.766
50	27.991	29.707	32.357	34.764	37.689	63.167	67.505	71.420	76.154	79.490
60	35.534	37.485	40.482	43.188	46.459	74.397	79.082	83.298	88.379	91.952
70	43.275	45.442	48.758	51.739	55.329	85.527	90.531	95.023	100.425	104.215
80	51.172	53.540	57.153	60.391	64.278	96.578	101.879	106.629	112.329	116.321
90	59.196	61.754	65.647	69.126	73.291	107.565	113.145	118.136	124.116	128.299
100	67.328	70.065	74.222	77.929	82.358	118.498	124.342	129.561	135.807	140.169

(a) What is ?

(b) What is ?

(c) A chi-square test gives with . At , what is the correct decision?

Problem 4 — Full Five-Step Test

Problem 1 — Full Analysis Chain

Problem 2 — Full Five-Step Test on a 2×3 Table with Cramér’s V (Generator)

Problem 3 — Find the Error

Problem 4 — Cramér’s V Standalone (Generator)

Problem 5 — Synthesis: Physical Activity and Stress

Mixed Review — Retrieval from Earlier Lessons

These problems draw on concepts from earlier in the course. Attempting them without re-reading prior lessons is the point — retrieval practice strengthens long-term memory more than re-reading.

Review Problem 1 — Z-Test for a Mean (inf-5)

Review Problem 2 — Proportion Test (inf-6)

Feynman Prompt

A colleague says: “We ran a chi-square test and got a highly significant result (). The association between the two variables is therefore very strong.”

In a short paragraph, explain what is wrong with this reasoning and what they should report instead. Aim for 150–400 words.

0 words

Show reference answer

A significant p-value tells you the association is real (not due to chance in the sample) — it says nothing about the strength of the association. With large sample sizes, even a negligibly weak association produces a very small p-value. Your colleague should compute Cramér’s V: if V < 0.1, the association is negligible despite the significance; if V ≥ 0.3, it is moderate or large. Always report both p-value and V.

Apply

A researcher surveys 300 university students on preferred study location (Library / Home / Café) and year of study (First year / Upper year). A chi-square test yields with .

(a) What is the decision at ? Use the chi-square table.

(b) What is the correct conclusion?

Analyze the Error

A student tests two categorical variables for independence and writes up their conclusion. Find the flawed claim.

Self-Assessment

How confident do you feel about the chi-square test of independence?

Still confusedReady for the Boss Fight

Choose your path. Both paths test everything from this lesson — pick the one that matches your strengths.

🔬 Path A — The Analyst

Perform a complete five-step chi-square test of independence, compute Cramér’s V, and write a professional research summary.

📝 Path B — The Communicator

Evaluate three brief research reports from different studies, identify the specific errors, and rewrite conclusions using proper statistical language.

Path A — The Analyst

A study of 120 adults measures caffeine intake level (Low / High) and sleep quality (Good / Fair / Poor).

	Good sleep	Fair sleep	Poor sleep	Row total
Low caffeine	30	20	10	60
High caffeine	15	20	25	60
Col total	45	40	35	120

Complete the full five-step chi-square test at . Then compute Cramér’s V. Finally, write a one-paragraph research summary using correct statistical language (avoid causation; state the effect size; include the decision).

Show solution

Step 1 — Hypotheses:

H₀: Caffeine intake and sleep quality are independent in the population.
Hₐ: Caffeine intake and sleep quality are not independent in the population.

Step 2 — Expected frequencies:

	Good	Fair	Poor
Low caffeine	22.5	20.0	17.5
High caffeine	22.5	20.0	17.5

All E ≥ 5. Conditions satisfied. .

Step 3 — Test statistic:

Step 4 — Critical value: .

Step 5 — Decision: Since , reject H₀ ().

Cramér’s V: — medium effect.

Research summary:

“A chi-square test of independence (χ² = 11.429, df = 2, p = 0.003) found sufficient evidence that caffeine intake level and sleep quality are not independent in the adult population sampled (n = 120). The association is of medium strength (Cramér’s V = 0.309), indicating a practically meaningful — not merely statistically significant — relationship between these variables. Adults with low caffeine intake tended to report better sleep, while those with high caffeine intake were more likely to report poor sleep. As an observational study, causation cannot be established; other factors associated with caffeine intake (e.g., work stress, screen time) may contribute to the observed pattern.”

Reflection: What was the most challenging part of this bivariate chi-square analysis? Was it expected frequency calculations or Cramér’s V interpretation?

Path B — The Communicator

A research team shares three brief reports from different studies. Evaluate each one: identify the error (if any), or confirm the analysis is correct. For each flawed report, rewrite the conclusion using proper statistical language.

Report 1:

“A chi-square test on a 2×2 table (χ² = 6.3, df = 1, n = 180) gives p = 0.012. We conclude there is sufficient evidence at the 5% level that the two categorical variables are not independent in the population. Cramér’s V = 0.19 indicates a small effect.”

Evaluate Report 1

This report is correct. The test conclusion uses proper language (sufficient evidence, not independent, at the 5% level). The Cramér’s V interpretation (small effect) is appropriate for V = 0.19. No causation is claimed. No errors.

Report 2:

“A chi-square test yields χ² = 4.2 (df = 2, p = 0.12). Since p > 0.05, we confirm that education level and political affiliation are independent.”

Evaluate Report 2

Error: “We confirm … are independent” after failing to reject H₀. Failing to reject H₀ does not prove the null hypothesis — it only means insufficient evidence of dependence.

Corrected conclusion: “There is insufficient evidence at the 5% level to conclude that education level and political affiliation are not independent in the population. The test does not confirm independence.”

Report 3:

“Our chi-square analysis shows that ice cream consumption and drowning rates are significantly associated (χ² = 18.7, df = 1, p < 0.001, V = 0.43). Therefore, eating ice cream increases the risk of drowning.”

Evaluate Report 3

Error: Claiming causation from a significant association. Chi-square tests statistical dependence between two variables in a sample — it cannot establish causal direction or rule out confounders. A lurking variable (summer heat) drives both ice cream sales and swimming activity, producing the association.

Corrected conclusion: “There is sufficient evidence at the 0.1% level that ice cream consumption and drowning rates are not independent (χ² = 18.7, df = 1, V = 0.43, medium effect). The association is likely due to a common cause (warm weather driving both behaviours) rather than any direct link between ice cream and drowning.”

Reflection: What was the most challenging part of identifying these errors? Which report was easiest to critique?

Challenge 1 — Sample Size Effect on χ² vs. Cramér’s V

Dataset V0 (smoking/exercise, n = 100) has the following observed proportions: among smokers, 30% exercise; among non-smokers, 50% exercise (an underlying 20-percentage-point gap).

Fix these proportions and scale the sample size. Complete the table.

n	Observed counts	χ²	Decision ()	V
100	[[15,35],[25,25]]	4.167	Reject H₀	0.204
200	[[30,70],[50,50]]	?	?	?
400	[[60,140],[100,100]]	?	?	?
800	[[120,280],[200,200]]	?	?	?

After completing the table, answer: What does this tell us about using χ² alone as a measure of association strength?

Show solution

n = 200 (proportions unchanged, counts doubled):

E values: each cell E doubles too. E₁₁ = 40, E₁₂ = 60, E₂₁ = 40, E₂₂ = 60.

(Exactly ✓). → Reject H₀. (unchanged).

n = 400:

. Reject H₀. .

n = 800:

. Reject H₀. .

Completed table:

n	χ²	Decision	V
100	4.167	Reject H₀	0.204
200	8.333	Reject H₀	0.204
400	16.667	Reject H₀	0.204
800	33.333	Reject H₀	0.204

Interpretation: χ² scales linearly with n — quadrupling the sample quadruples the test statistic. With a large enough n, even a negligibly weak association will produce a statistically significant χ². Cramér’s V, by contrast, stays constant at 0.204 regardless of n — it measures the true strength of the association, independent of sample size. Always report V alongside χ² to communicate both significance and practical importance.

The explorer below makes this dynamic. The underlying association is held fixed while you drag the sample size: watch χ² ramp up and cross the significance threshold while Cramér’s V stays perfectly flat. Switch to the Negligible preset to confirm that even a trivial association becomes “significant” once n is large enough — the whole reason effect size must be reported alongside the p-value.

Significance versus effect size as sample size grows

Sample size n = 100

Not significant — Fail to Reject H₀

Figure: The underlying association is held fixed — only n changes. The χ² line climbs linearly with n and eventually clears the critical value (it becomes "significant") at n*, yet Cramér's V never moves — it measures the true strength of the association, independent of sample size. Switch to the Negligible preset to see that even a trivial association becomes statistically significant once n is large enough. This is why a significant p-value must always be reported alongside V.

Challenge 2 — 3×3 Table

A sociologist surveys 300 randomly selected adults and classifies each by highest degree earned (High school / Bachelor’s / Graduate) and job sector (Public / Private / Nonprofit).

	Public	Private	Nonprofit	Row total
High school	60	25	5	90
Bachelor’s	60	50	10	120
Graduate	30	25	35	90
Col total	150	100	50	300

(a) State H₀ and Hₐ.

(b) Compute all 9 expected frequencies and check conditions. Unlike a table with balanced margins, each cell here needs its own computation — every row total and column total is different.

(c) Find df. How does the df formula change compared to a 2×2 table?

(d) Compute χ² (all 9 cell contributions).

(e) Decide at using the chi-square table.

(f) Compute Cramér’s V and interpret — using both the standard bands and the shifted bands for a table with .

Show solution

(a) H₀: Degree earned and job sector are independent in the population. Hₐ: They are not independent.

(b) With unbalanced margins, every expected frequency is different — e.g. ; :

	Public	Private	Nonprofit
High school	45.0	30.0	15.0
Bachelor’s	60.0	40.0	20.0
Graduate	45.0	30.0	15.0

The minimum expected frequency is 15.0 ≥ 5. Conditions satisfied.

(c) . For a 2×2 table, df = 1. For a 3×3 table, df = 4 — because there are more independent cell comparisons. A larger df requires a larger critical value to reject H₀.

(d) Each cell contributes :

Cell	O	E	(O − E)²/E
HS, Public	60	45.0	5.000
HS, Private	25	30.0	0.833
HS, Nonprofit	5	15.0	6.667
Bachelor’s, Public	60	60.0	0.000
Bachelor’s, Private	50	40.0	2.500
Bachelor’s, Nonprofit	10	20.0	5.000
Graduate, Public	30	45.0	5.000
Graduate, Private	25	30.0	0.833
Graduate, Nonprofit	35	15.0	26.667

(e) . Since , reject H₀ strongly.

There is very strong evidence that degree earned and job sector are not independent in the population — driven largely by the Graduate/Nonprofit cell (35 observed vs. 15.0 expected).

(f)

By the standard fixed table, classifies as a small effect. But recall the caveat from C9: for a table () the boundaries shift down to roughly , under which falls in — medium. The two conventions disagree here in a way that actually matters: a report using the standard bands would call this association “small,” while one using the shifted bands appropriate to a 3×3 table would call it “medium” — same data, two different practical conclusions. The safest practice is to state which convention is being used.

Challenge 3 — Simpson’s Paradox

A hospital compares recovery rates for two treatments (A vs. B). The overall combined data appears to show Treatment A is superior:

Combined data (n = 200):

	Recovered	Not recovered	Total
Treatment A	78	22	100
Treatment B	73	27	100

(a) Based on the combined table, which treatment appears better?

After stratifying by condition severity:

Mild cases (n = 130):

	Recovered	Not recovered	Total
Treatment A	72	8	80
Treatment B	45	5	50

Severe cases (n = 70):

	Recovered	Not recovered	Total
Treatment A	6	14	20
Treatment B	28	22	50

(b) For mild cases, which treatment is better (or are they tied)?

(c) For severe cases, which treatment is better?

(d) Why does Treatment A appear better overall even though it is no better (or worse) for each severity group? What lurking variable drives the reversal?

(e) If a hospital administrator used only the combined table chi-square result to recommend a treatment, what error would they be making?

(f) What general lesson does this illustrate about chi-square tests and association analyses?

Show solution

(a) Treatment A: 78/100 = 78%. Treatment B: 73/100 = 73%. Treatment A appears better in the combined data.

(b) Mild cases: A = 72/80 = 90.0%; B = 45/50 = 90.0%. Exactly tied.

(c) Severe cases: A = 6/20 = 30.0%; B = 28/50 = 56.0%. Treatment B is better.

(d) Treatment A was assigned mostly to mild cases (80 of its 100 patients), which naturally have higher recovery rates regardless of treatment. Treatment B handled more severe cases (50 of its 100 patients). The overall rate for Treatment A is inflated by its patient mix. Disease severity is a lurking variable (confounder) — it is associated with both treatment assignment and recovery outcome, creating the illusion that A outperforms B.

(e) The administrator would recommend Treatment A based on a confounded combined analysis. The correct conclusion — that B is equally effective for mild cases and superior for severe cases — is only visible after stratifying by severity.

(f) This is Simpson’s Paradox: an association present in the combined data can reverse or disappear after stratifying by a third variable. A chi-square test on a combined table may mask patterns present within subgroups. Whenever a lurking variable might confound the association, the data should be stratified by that variable. Statistical analysis must always account for the study design and potential confounders.

The explorer below lets you switch between the combined view and each severity stratum, with the patient mix shown underneath. Watch how Treatment A “wins” only when the strata are pooled — because A treated mostly mild cases, the easy ones to recover from.

Simpson's paradox explorer

Figure: Treatment A wins the Combined table (78% vs 73%) — yet Treatment B is tied in mild cases and clearly better in severe cases. The reversal is driven by the patient mix: A treated mostly easy mild cases, B mostly hard severe cases. Case severity is a lurking variable, so a chi-square test on the pooled table would be dangerously misleading. Always check whether a confounder could be driving the association.

Full worked solutions for all problems in this lesson (Sections 5–9) are available on the dedicated solutions page. Solutions include every computation step, formula derivation, and interpretation note.

View all solutions →

Quick-Reference Formulas

Formula	Purpose
	Expected frequency under H₀ (independence)
	Chi-square test statistic; always ≥ 0
	Degrees of freedom; = rows, = columns
	Cramér’s V effect size; range [0, 1]

Key Interpretation Rules

Expected frequencies: Compute every before running the test. All must be ≥ 5; if any are not, combine adjacent categories.
Decision rule: Reject H₀ if (equivalently, if ). Fail to reject otherwise.
“Fail to reject”: Does not mean the variables are independent. Write “insufficient evidence to conclude the variables are not independent.”
Conclusion language: Always name both variables and the significance level: “There is sufficient evidence at the 0.05 level that [variable 1] and [variable 2] are not independent in the population.”
Cramér’s V: V < 0.1 negligible · 0.1–0.3 small · 0.3–0.5 medium · ≥ 0.5 large. Report alongside the p-value.
Causation: A significant establishes association only — never causal direction.

Common Pitfalls

Pitfall	What goes wrong	Correction
P1 — Conditions skipped	Test run without checking	Compute all expected frequencies first; combine categories if needed
P2 — O and E swapped	Writing or in the numerator	Denominator is always ; numerator is
P3 — Wrong df	Using or	Always
P4 — Causation error	”Variable A causes Variable B” after a significant result	Chi-square shows association, not causation
P5 — Independence claimed	”The variables are independent” after failing to reject	Failing to reject H₀ is not proof of independence

REG-4: Chi-Square Test of Independence

Section 1: Introduction

Section 2: Prerequisites

Section 3: Core Concepts

C1 — Contingency Table Structure

Contingency Table

C2 — Expected Frequencies

Expected Frequency

C3 — Chi-Square Test Statistic

Chi-Square Test Statistic

C4 — Degrees of Freedom

Degrees of Freedom for Chi-Square Test of Independence

C5 — Conditions for the Test

C6 — Five-Step Test for Independence

C7 — p-Value and Critical Value

Chi-Square Distribution Table

C8 — Conclusion Language

Standard Conclusion Statements

C9 — Cramér’s V: Effect Size

Cramér's V

C10 — Association vs. Causation

Section 4: Worked Examples

Example 1 — Fully Worked (2×2 Test)

Example 2 — Partially Scaffolded (2×2)

Example 3 — Cramér’s V Calculation

Example 4 — Find the Error (Conditions Violated)

Section 5: Guided Practice

Problem 1 — Expected Frequencies

Problem 2 — Conditions Check and Degrees of Freedom

Problem 3 — Reading the Chi-Square Table

Chi-Square Distribution Table

Problem 4 — Full Five-Step Test

Section 6: Independent Practice

Problem 1 — Full Analysis Chain

Problem 2 — Full Five-Step Test on a 2×3 Table with Cramér’s V (Generator)

Problem 3 — Find the Error

Problem 4 — Cramér’s V Standalone (Generator)

Problem 5 — Synthesis: Physical Activity and Stress

Mixed Review — Retrieval from Earlier Lessons

Review Problem 1 — Z-Test for a Mean (inf-5)

Review Problem 2 — Proportion Test (inf-6)

Section 7: Mastery Check

Feynman Prompt

Apply

Analyze the Error

Self-Assessment

Section 8: Boss Fight

🔬 Path A — The Analyst

📝 Path B — The Communicator

Path A — The Analyst

Path B — The Communicator

Section 9: Challenge Problems

Challenge 1 — Sample Size Effect on χ² vs. Cramér’s V

Challenge 2 — 3×3 Table

Challenge 3 — Simpson’s Paradox

Section 10: Solutions Reference

Quick-Reference Formulas

Key Interpretation Rules

Common Pitfalls