EN FR

REG-5: Applications of Bivariate Analysis

Module 5 · Regression & Association

Section 1: Introduction

A newspaper headline reads: “Study proves social media use lowers grades.” The fine print: the researchers computed (, ). A second research team studying the same question applied a chi-square test to screen time (hours/day) and GPA — two quantitative variables. Their headline: “Screen time and GPA are not independent.”

Both studies made the front page. Both have serious problems.

By the end of this lesson, you will be equipped to spot exactly these errors — and to perform and communicate bivariate analyses correctly.

After this lesson, you will be able to:

By the end of this lesson, you will be able to:

  • Classify any pair of variables as quantitative or qualitative, and select the correct bivariate analysis method.
  • Distinguish statistical significance from practical significance, and always pair a p-value with an appropriate effect size ( or Cramér’s V).
  • Write a complete, accurate research conclusion using proper association language — no causal claims, no magnitude overstatements.
  • Identify common method-selection and communication errors in published research summaries.

Section 2: Prerequisites

What you need before starting this lesson:

  • From REG-3 — Regression Interpretation and Prediction: Pearson correlation coefficient (direction, strength, range to ); coefficient of determination (proportion of variance explained); five-step test for H₀: ρ = 0 using , ; correct conclusion language (“sufficient evidence of a [positive/negative] linear relationship”); association vs. causation distinction. You need these because today you will be choosing when to use this test and interpreting its output alongside effect size.
  • From REG-4 — Chi-Square Test of Independence: Five-step chi-square test; expected frequencies ; test statistic ; degrees of freedom ; conditions (all ); Cramér’s V ; correct conclusion language (“sufficient evidence that [Var 1] and [Var 2] are not independent”); never say “independent” after failing to reject H₀. You need these because today you will be deciding when chi-square is appropriate and reading its effect size.

Quick check before we begin:

Also confirm you remember these key ideas (check each one):

ChiTable reminder — you will need chi-square critical values when evaluating chi-square results in this lesson:

Chi-Square Distribution Table

Critical values (χ²) for given degrees of freedom (df) and upper tail probability (p).

df \ p 0.9950.990.9750.950.900.100.050.0250.010.005
1 0.0000.0000.0010.0040.0162.7063.8415.0246.6357.879
2 0.0100.0200.0510.1030.2114.6055.9917.3789.21010.597
3 0.0720.1150.2160.3520.5846.2517.8159.34811.34512.838
4 0.2070.2970.4840.7111.0647.7799.48811.14313.27714.860
5 0.4120.5540.8311.1451.6109.23611.07012.83315.08616.750
6 0.6760.8721.2371.6352.20410.64512.59214.44916.81218.548
7 0.9891.2391.6902.1672.83312.01714.06716.01318.47520.278
8 1.3441.6462.1802.7333.49013.36215.50717.53520.09021.955
9 1.7352.0882.7003.3254.16814.68416.91919.02321.66623.589
10 2.1562.5583.2473.9404.86515.98718.30720.48323.20925.188
11 2.6033.0533.8164.5755.57817.27519.67521.92024.72526.757
12 3.0743.5714.4045.2266.30418.54921.02623.33726.21728.300
13 3.5654.1075.0095.8927.04219.81222.36224.73627.68829.819
14 4.0754.6605.6296.5717.79021.06423.68526.11929.14131.319
15 4.6015.2296.2627.2618.54722.30724.99627.48830.57832.801
16 5.1425.8126.9087.9629.31223.54226.29628.84532.00034.267
17 5.6976.4087.5648.67210.08524.76927.58730.19133.40935.718
18 6.2657.0158.2319.39010.86525.98928.86931.52634.80537.156
19 6.8447.6338.90710.11711.65127.20430.14432.85236.19138.582
20 7.4348.2609.59110.85112.44328.41231.41034.17037.56639.997
21 8.0348.89710.28311.59113.24029.61532.67135.47938.93241.401
22 8.6439.54210.98212.33814.04130.81333.92436.78140.28942.796
23 9.26010.19611.68913.09114.84832.00735.17238.07641.63844.181
24 9.88610.85612.40113.84815.65933.19636.41539.36442.98045.559
25 10.52011.52413.12014.61116.47334.38237.65240.64644.31446.928
26 11.16012.19813.84415.37917.29235.56338.88541.92345.64248.290
27 11.80812.87914.57316.15118.11436.74140.11343.19546.96349.645
28 12.46113.56515.30816.92818.93937.91641.33744.46148.27850.993
29 13.12114.25616.04717.70819.76839.08742.55745.72249.58852.336
30 13.78714.95316.79118.49320.59940.25643.77346.97950.89253.672
40 20.70722.16424.43326.50929.05151.80555.75859.34263.69166.766
50 27.99129.70732.35734.76437.68963.16767.50571.42076.15479.490
60 35.53437.48540.48243.18846.45974.39779.08283.29888.37991.952
70 43.27545.44248.75851.73955.32985.52790.53195.023100.425104.215
80 51.17253.54057.15360.39164.27896.578101.879106.629112.329116.321
90 59.19661.75465.64769.12673.291107.565113.145118.136124.116128.299
100 67.32870.06574.22277.92982.358118.498124.342129.561135.807140.169

Success Factor:

The Boundary:

The boundary this lesson crosses: In REG-3 and REG-4, you learned how to run each test — the mechanics of computation. Today’s lesson is about when to use each test, how strongly the result matters (effect size), and how to communicate findings without overstating them. The formulas are the same; the judgment layer is new.

Retrieval Warm-up — from earlier lessons

A researcher reports between hours of television watched per day () and reading comprehension score () in a sample of 80 middle-school students. Which statement is the most accurate interpretation?

In REG-3, the significance test for correlation uses with . Using and from the problem above, which conclusion follows at (two-tailed for )?

Section 3: Core Concepts

Ten concepts — here is how they fit together:

  • C1–C4: Method selection — the decision gate. Classify your variables first; pick your method second.
  • C5–C6: Effect size for correlation, Cramér’s V for chi-square. A significant p-value is not enough.
  • C7: Statistical vs. practical significance — the most commonly misunderstood concept in this module.
  • C8–C9: Conclusion language — exact wording for each test. One word wrong and the conclusion is misleading.
  • C10: Communicating findings responsibly — the full checklist for a correct research summary.

C1 — Decision Rule: Variable Types

Before choosing any analysis method, you must classify both variables.

Bivariate Method Decision Rule

  • Two quantitative variables → Pearson correlation / linear regression
  • Two qualitative (categorical) variables → Chi-square test of independence
  • One quantitative + one qualitative → Neither chi-square nor correlation applies directly. A group-comparison method (e.g., ANOVA) is appropriate — but beyond the scope of this course. Flag the mismatch.

This is the primary gate: classify both variables before choosing any method.

Use this interactive flowchart to identify the correct method. Classify both variables, then follow the tree.

Interactive decision tree for selecting bivariate analysis method Variable 1 type? Quantitative Qualitative Variable 2 type? Quantitative Qualitative Pearson Correlation / Linear Regression Report: r, r², t-statistic, p-value, conclusion Q × Q Chi-Square Test of Independence Report: χ², df, Cramér's V, p-value, conclusion Cat × Cat Neither applies directly A group-comparison method (e.g., ANOVA) is needed — beyond this course Q × Cat or Cat × Q

Mini-example: A researcher has data on (1) monthly rent paid in dollars and (2) commute time in minutes for 80 renters. Both variables are quantitative — arithmetic differences are meaningful. → Use Pearson correlation.

The same researcher also records each renter’s (1) neighborhood type (urban / suburban / rural) and (2) transport mode (car / transit / bike). Both variables place observations into named categories — no arithmetic is meaningful. → Use chi-square.


C2 — Quantitative Variable Identification

Quantitative Variable

A quantitative variable takes numerical values where arithmetic operations (differences, averages) are meaningful: height in cm, temperature in °C, exam score out of 100, income in dollars, number of injuries. Count variables (number of absences, number of siblings) are quantitative.

Quick test: “Does the average of these values make sense?” If someone averages the exam scores of 30 students to get 74.3 — yes, that’s meaningful. → Quantitative.

A variable stored as a number is not automatically quantitative. Student ID numbers and zip codes are stored numerically — but averaging them produces nonsense. Classification depends on whether arithmetic is meaningful, not whether the data file contains digits.


C3 — Qualitative Variable Identification

Qualitative (Categorical) Variable

A qualitative variable places observations into named categories with no inherent numeric meaning: blood type (A/B/AB/O), grade category (below average/average/above average), political affiliation (Liberal/Conservative/Independent), disease status (diagnosed/not diagnosed). Ordered categories (low/medium/high stress) are still qualitative unless treated numerically.

Quick test: “Do the category labels have a natural numeric difference?” If stress levels are “low”, “medium”, “high” — you cannot compute “high minus low = 2” in any meaningful way. → Qualitative.

Ordinal variables (with a natural order like low/medium/high) are still qualitative unless a researcher explicitly treats the spacing as equal and numerically meaningful. When in doubt in this course, treat ordered categories as qualitative and use chi-square.


C4 — Mixed Variable Types: Out of Scope

Mixed Variable Types

When one variable is quantitative and the other is qualitative, neither chi-square nor Pearson correlation applies directly. Chi-square needs two categorical variables; correlation needs two numerical variables. The appropriate approach is to compare group means (e.g., ANOVA or t-test) — but this is beyond the scope of this course. The correct action here is to flag the mismatch and name the issue.

A common error: a researcher records political affiliation (qualitative) and happiness score on a 1–10 scale (quantitative), then runs a chi-square. This is wrong — chi-square cannot process the numerical happiness scores as categories without losing information and violating the method’s requirements.


C5 — Effect Size for Correlation:

Coefficient of Determination r²

is the proportion of variance in explained by the linear relationship with . A statistically significant (p < α) proves ρ ≠ 0 in the population but does not indicate the model is useful. Always report alongside p.

Thresholds (guidelines, not sharp rules):

  • < 0.04 → negligible
  • 0.04 ≤ < 0.15 → weak
  • 0.15 ≤ < 0.35 → moderate
  • ≥ 0.35 → strong

Example: , , . This is statistically significant. But — social media explains 3.2% of variance in GPA. The headline “proves social media lowers grades” vastly overstates a negligible practical effect.

does NOT mean the predictor explains 72% of variance — that would confuse with . The actual explained variance is , or 51.8%. Always square to get the proportion of variance explained.


C6 — Effect Size for Chi-Square: Cramér’s V

Cramér's V

where = number of rows, = number of columns. Cramér’s V is sample-size-independent — it measures association strength alone.

Thresholds:

  • V < 0.1 → negligible
  • 0.1 ≤ V < 0.3 → small
  • 0.3 ≤ V < 0.5 → medium
  • V ≥ 0.5 → large

Example: A researcher reports (df = 1, , < 0.001) and concludes “enormous chi-square confirms a very strong association.” But V — negligible. The large is driven by the large , not a strong relationship.

depends on — doubling the sample size doubles even if the proportions are unchanged. Never use alone to judge strength of association. Always compute and report Cramér’s V.


C7 — Statistical vs. Practical Significance

Statistical vs. Practical Significance

  • Statistical significance (p < α): the observed association is unlikely to have arisen from chance alone — we reject H₀. This is a claim about signal vs. noise.
  • Practical significance (effect size): the association is large enough to matter in the real world. This is a claim about magnitude.

Large makes even negligible associations statistically significant. With , gives < 0.001, yet — the predictor explains 0.25% of variance. The reverse also occurs: a strong effect in a small sample may not reach significance. Always pair p with or V.

“The result is significant, therefore it is important.” This is the most common misuse of statistics in published research. Statistical significance tells you the direction of an effect is reliably non-zero. Effect size tells you whether the effect is worth caring about.


C8 — Conclusion Language: Correlation / Regression

Correct Conclusion Language for Correlation Tests

Template: “There is [sufficient / insufficient] evidence of a [positive / negative] linear relationship between [X] and [Y] in the population.”

Always also report: , , -statistic, , and the decision at α.

Never say “X causes Y” based on correlation alone. Correlation tests association only.

“There is sufficient evidence that X affects Y.” — The word “affects” implies causation. Use “is associated with” or “has a linear relationship with.” Causal language requires experimental design with random assignment, not observational data.


C9 — Conclusion Language: Chi-Square

Correct Conclusion Language for Chi-Square Tests

Template: “There is [sufficient / insufficient] evidence that [Variable 1] and [Variable 2] are not independent in the population.”

Always also report: , , V, and the decision at α.

Never say the variables “are independent” after failing to reject H₀ — that would be claiming proof of the null. The correct language is “insufficient evidence to conclude they are not independent.”

“We fail to reject H₀, so the variables are independent.” — Failing to reject H₀ is not the same as proving H₀. We simply lack sufficient evidence of dependence. The variables could be dependent in a way our sample was too small to detect.


C10 — Communicating Findings Responsibly

Checklist: Complete Research Summary

A complete research report includes all seven elements:

  1. Method used and why — name the test and state why the variable types justify it.
  2. Test statistic and p-value — report or , , and .
  3. Decision — “reject H₀” or “fail to reject H₀” at α = [value].
  4. Effect size with interpretation or V with the appropriate threshold label.
  5. Conclusion in plain language — one sentence accessible to a non-statistician.
  6. Association language only — never “causes”, “proves”, “affects.”
  7. Study limitations — if observational, note that causation cannot be established; if sample size is small, note reduced power.

Section 4: Worked Examples

Example 1 — Fully Worked: Daily Steps and Resting Heart Rate (QQ2)

Scenario: A cardiologist examines whether the number of daily steps is associated with resting heart rate (bpm) for adults. Given: , .

Step 1 — Classify both variables.

I notice both variables are numerical measurements where arithmetic is meaningful: daily steps is a count (quantitative) and resting heart rate in bpm is a continuous measurement (quantitative). Two quantitative variables → Pearson correlation is the appropriate method. Chi-square would be wrong here.

Step 2 — State hypotheses.

H₀: ρ = 0 (no linear relationship between daily steps and resting heart rate in the population).
Hₐ: ρ ≠ 0 (a linear relationship exists in the population). Two-tailed test at α = 0.05.

Step 3 — Check conditions.

We assume this is a random or representative sample and that the relationship is approximately linear in the population. The test is robust for .

Step 4 — Compute the test statistic.

. Critical value: .

Step 5 — Make the decision.

I compare to . Since , I reject H₀ ( < 0.001).

Step 6 — Interpret effect size.

— daily steps explains 37.2% of the variance in resting heart rate. Using the thresholds: 0.35 ≤ → strong practical significance. The association is not only statistically real but practically meaningful.

Step 7 — Write the conclusion.

”There is sufficient evidence of a negative linear relationship between daily steps and resting heart rate in the population (, , < 0.001, ). As daily steps increase, resting heart rate tends to decrease — and this association is of strong practical magnitude.”

Metacognitive note: I noticed the negative before computing anything — this tells me the direction is inverse (more steps → lower heart rate). I chose Pearson correlation because both variables are quantitative, and I reported because a significant p-value alone doesn’t tell us the association is meaningful.


Example 2 — Partially Scaffolded: Stress Level and Sleep Quality (CC5)

Scenario: A sleep researcher surveys 140 adults, recording stress level (low / high) and sleep quality (good / poor). This is pre-classified for you: both variables are qualitative (categorical). The appropriate method is the chi-square test of independence.

Given pre-computed results: , , .

Your task — complete Steps 4 and 5:

Before revealing the decision: Based on and , what is your prediction about the decision at α = 0.05? (Check the ChiTable in Section 2 — .)

Show Solution

Step 4 — Decision: Since , we reject H₀ ( < 0.001).

Step 5 — Write the conclusion: “There is sufficient evidence that stress level and sleep quality are not independent in the population (, < 0.001, ). Adults with high stress tend to have poorer sleep quality — but with , this is a small-to-moderate association. Other factors likely explain much of the variation in sleep quality.”

Effect size note: falls in the “small” range (0.1–0.3). The association is real but modest — not a large or dominant effect.


Example 3 — Solution Behind Details: TV Hours and GPA (QQ5)

Scenario: A student affairs researcher studies whether watching more TV is associated with lower GPA for students. Given: , , , , .

Before checking the solution: Is this result practically meaningful? looks moderate — but pause before deciding. What does tell you? Would you recommend this finding to a school board as evidence that TV limits should be enforced?

Show Solution

Variable classification: Both variables are quantitative (hours/week and GPA on a 0–4 scale). → Pearson correlation is correct.

Decision: → reject H₀ ().

Conclusion: “There is sufficient evidence of a negative linear relationship between weekly TV hours and GPA in the population (, , , ). More TV hours are associated with lower GPA in this population.”

Effect size interpretation: — TV hours explains only 14.4% of the variance in GPA. This falls in the “weak” range (0.04–0.15). The result is statistically significant but of weak practical significance.

Key teaching point: This result is statistically real — but a school board should not implement TV restrictions based on this alone. An of 0.144 means 85.6% of GPA variation is explained by other factors. The effect exists, but it is small relative to the complexity of academic performance.


Example 4 — Find the Error: Chi-Square Applied to Quantitative Variables

A researcher’s report:

“We collected data on hours of sleep per night (ranging from 4 to 10 hours, measured to the nearest half-hour) and GPA (0–4 scale, measured to two decimal places) for 45 university students. We ran a chi-square test of independence and found χ² = 18.4 (df = 12, p = 0.048), concluding: ‘Sleep hours and GPA are not independent in the population — students who sleep more tend to have higher GPAs.’”

Identify the errors. There are at least two.

Show Solution

Error 1 — Wrong method: Both variables are quantitative. Sleep hours (measured to the nearest half-hour) and GPA (measured to two decimal places) are continuous numerical measurements where arithmetic differences are meaningful. Chi-square requires two qualitative (categorical) variables. The appropriate method is Pearson correlation (or linear regression).

Error 2 — The chi-square output is meaningless in this context: When chi-square is applied to quantitative data, the researcher must first have binned or categorized the data (creating artificial categories). This destroys information and introduces arbitrary cut-points. The resulting value does not test anything meaningful about the original continuous relationship.

Error 3 (bonus) — Conclusion language: “Students who sleep more tend to have higher GPAs” is an interpretive statement that goes beyond what the (incorrectly applied) test can establish. Even if correlation were run correctly, association language must be used: “sleep hours are associated with GPA.”

What the researcher should have done: Compute Pearson between sleep hours and GPA. Report , , the -statistic, , and . Use the conclusion template from C8.

Section 5: Guided Practice

Problem 1 — Method Selection

For each scenario, classify both variables and select the appropriate analysis method.

Scenario: “A professor collects data on weekly study hours and final exam scores for 25 students to investigate whether more studying leads to higher scores.”

(a) What type of variable is “weekly study hours”?
(b) What type of variable is “final exam score”?
(c) Which analysis method is appropriate?
Show Solution

(a) Quantitative — weekly study hours is a count/continuous measurement. (b) Quantitative — exam score is a numerical measurement on a defined scale. (c) Pearson correlation / linear regression — two quantitative variables.

Scenario: “A health researcher surveys 100 adults, recording whether each is a smoker or non-smoker, and whether each exercises regularly or irregularly. They want to know if smoking status and exercise habits are related.”

(a) What type of variable is “smoking status”?
(b) What type of variable is “exercise frequency”?
(c) Which analysis method is appropriate?
Show Solution

(a) Qualitative — smoking status is a binary category. (b) Qualitative — exercise frequency is a binary category (regular/irregular). (c) Chi-square test of independence — two qualitative variables.

Scenario: “A sports scientist records the average weekly training hours and the number of sports injuries per season for 35 athletes.”

(a) What type of variable is “weekly training hours”?
(b) What type of variable is “number of sports injuries per season”?
(c) Which analysis method is appropriate?
Show Solution

(a) Quantitative — continuous measurement. (b) Quantitative — count variable (discrete but quantitative). (c) Pearson correlation / linear regression — two quantitative variables.

Scenario: “A marketing team surveys 200 customers, recording their preferred smartphone brand (Brand A / Brand B / Brand C) and their age group (18–30 / 31–50 / 51+). They want to test whether brand preference depends on age.”

(a) What type of variable is “preferred smartphone brand”?
(b) What type of variable is “age group”?
(c) Which analysis method is appropriate?
Show Solution

(a) Qualitative — brand names are named categories. (b) Qualitative — age is grouped into named brackets (not individual numerical values). (c) Chi-square test of independence — two qualitative variables.

Scenario: “A nutritionist measures daily caloric intake (kcal) and body weight (kg) for 45 adults to determine if higher caloric intake is associated with greater body weight.”

(a) What type of variable is “daily caloric intake (kcal)”?
(b) What type of variable is “body weight (kg)”?
(c) Which analysis method is appropriate?
Show Solution

(a) Quantitative — continuous measurement in kcal. (b) Quantitative — continuous measurement in kg. (c) Pearson correlation / linear regression — two quantitative variables.


Problem 2 — Interpreting Given Results

For each pre-computed result, select the correct conclusion and effect size interpretation.

Results from a study of weekly study hours and final exam scores (): , , , reject H₀ ( < 0.001).

(a) Which statement correctly concludes this test?
(b) Which statement correctly interprets the effect size?
Show Solution

(a) “There is sufficient evidence of a positive linear relationship between study hours and exam scores in the population.” — Correct association language, no causal claim, reports the direction.

(b) r² = 0.518 — study hours explains 51.8% of the variance in exam scores. Strong practical effect. — r² ≥ 0.35 → strong. This is both statistically and practically significant.

Results from a study of smoking status and exercise frequency (): , , reject H₀ ().

(a) Which statement correctly concludes this test?
(b) Which statement correctly interprets the effect size?
Show Solution

(a) “There is sufficient evidence that smoking status and exercise frequency are not independent in the population.” — Correct chi-square conclusion language.

(b) V = 0.204 → small association. Statistically significant but modest practical magnitude.

Results from a study of weekly TV hours and GPA (): , , , reject H₀ ().

(a) Which statement correctly concludes this test?
(b) Which statement correctly interprets the effect size?
Show Solution

(a) Correct — negative relationship, sufficient evidence, proper association language.

(b) Correct — r² = 0.144 falls in the weak range (0.04–0.15). The result is statistically significant but of weak practical importance.

Results from a study of education level and daily newspaper reading (): , fail to reject H₀ ().

(a) Which statement correctly concludes this test?
(b) What is the correct interpretation when we fail to reject H₀?
Show Solution

(a) Correct — “insufficient evidence … are not independent” — proper language for failing to reject H₀ in chi-square.

(b) Correct — we do not claim independence; we only report insufficient evidence of dependence.

Results from a study of commute time and work satisfaction (): , , , fail to reject H₀ ().

(a) Which statement correctly concludes this test?
(b) Which statement correctly interprets r² = 0.048?
Show Solution

(a) Correct — “insufficient evidence of a linear relationship” — proper language for failing to reject H₀ in a correlation test.

(b) Correct — r² = 0.048 is weak (0.04–0.15 range), even if the result were significant.


Problem 3 — Evaluate Two Studies

Read the two research summaries carefully and answer the questions.

Study A: “We measured GPA (0–4 scale) and weekly shoe-shopping frequency for 200 students. We found , . We conclude there is insufficient evidence of a relationship between GPA and shoe-shopping, so students who buy more shoes must have the same GPA as others.”

Study B: “We surveyed 500 employees on their department (Sales / Marketing / Engineering / HR) and their self-rated work-life balance (1–10 scale, numerical). We used chi-square and found (, ), concluding the variables have a strong association.”

(a) What is the primary error in Study A’s conclusion?

(b) What is the primary error in Study B’s analysis?

Show Solution

(a) Study A’s error: “must have the same GPA as others” claims proof of no relationship — this is the “accept H₀” fallacy. Failing to reject H₀ means only that we lack sufficient evidence of a relationship. The variables could be related in a way the sample was too small to detect (note: n = 200 with r = 0.13 would give a borderline result; a larger sample might find significance).

(b) Study B’s error: Work-life balance on a 1–10 numerical scale is quantitative. Chi-square requires two categorical variables. The researcher should use Pearson correlation between department (which is qualitative — this actually creates a mixed-type problem!) or, more precisely, a group-comparison method (ANOVA comparing mean work-life balance across departments). Simply put: a 1–10 rating scale is quantitative, and chi-square is not valid for it.


Problem 4 — Full Bivariate Scenario

Section 6: Independent Practice

Problem 1 — Full Analysis Chain

Given pre-computed results, work through the full analysis: method selection, decision, conclusion, and effect size classification.

Scenario: A professor studies whether weekly study hours () predict exam scores () for students. Results: , , .

(a) Which method is appropriate?
(b) What is the decision at α = 0.05? ()
(c) Which conclusion is correct?
(d) What is the correct effect size classification?
Show Solution

(a) Pearson correlation — two quantitative variables. (b) Reject H₀ — |t| = 4.975 > t*(23) = 2.069. (c) “There is sufficient evidence of a positive linear relationship between study hours and exam scores in the population.” (d) r² = 0.518 → strong practical significance.

Scenario: A researcher tests whether study method (self / group) and grade category (below avg / average / above avg) are independent for students. Results: , .

(a) Which method is appropriate?
(b) What is the decision at α = 0.05? ()
(c) Which conclusion is correct?
(d) What is the correct effect size classification?
Show Solution

(a) Chi-square — two qualitative variables. (b) Reject H₀ — 16.333 > 5.991. (c) “There is sufficient evidence that study method and grade category are not independent in the population.” (d) V = 0.369 → medium effect.

Scenario: A financial planner investigates whether higher monthly spending () is associated with lower monthly savings () for clients. Results: , , .

(a) Which method is appropriate?
(b) What is the decision at α = 0.05? ()
(c) Which conclusion is correct?
(d) What is the correct effect size classification?
Show Solution

(a) Pearson correlation — two quantitative variables. (b) Reject H₀ — |t| = 3.373 > t*(38) = 2.024. (c) “There is sufficient evidence of a negative linear relationship between monthly spending and monthly savings in the population.” (d) r² = 0.230 → moderate practical significance.

Scenario: A public health researcher tests whether age group (under 40 / 40+) and vaccine uptake (vaccinated / not vaccinated) are independent for adults. Results: , .

(a) Which method is appropriate?
(b) What is the decision at α = 0.05? ()
(c) Which conclusion is correct?
(d) What is the correct effect size classification?
Show Solution

(a) Chi-square — two qualitative variables (both are binary categorical). (b) Reject H₀ — 8.081 > 3.841. (c) “There is sufficient evidence that age group and vaccine uptake are not independent in the population.” (d) V = 0.201 → small effect (statistically significant, modest magnitude).

Scenario: A researcher studies whether weekly TV hours () is associated with GPA () for students. Results: , , .

(a) Which method is appropriate?
(b) What is the decision at α = 0.05? ()
(c) Which conclusion is correct?
(d) What is the correct effect size classification?
Show Solution

(a) Pearson correlation — two quantitative variables. (b) Reject H₀ — |t| = 3.128 > t*(58) = 2.002. (c) “There is sufficient evidence of a negative linear relationship between weekly TV hours and GPA in the population.” (d) r² = 0.144 → statistically significant but weak practical significance. The key teaching point: significant ≠ practically important.


Problem 2 — Statistical vs. Practical Significance


Problem 3 — Find the Error

Each scenario contains a flawed researcher statement. Identify the specific error.

Scenario: “A researcher encodes study method as numeric: self-study = 1, group study = 2. They compute Pearson between this encoded variable and GPA (0–4 scale). They find , < 0.05, and conclude there is a significant positive linear relationship between study method and GPA.”

What is the error in this analysis?
Show Solution

Study method (self-study/group study) is a nominal qualitative variable. Assigning 1 and 2 does not create numerical meaning — you cannot meaningfully compute the average of “self-study + group study / 2.” The correct method is chi-square (comparing grade categories across study method groups). Applying Pearson r produces a meaningless coefficient.

Scenario: “A researcher finds that ice cream sales and drowning deaths are positively correlated (, < 0.001). They conclude: ‘Ice cream consumption causes drowning.’”

What is the error in this conclusion?
Show Solution

A classic lurking variable (confounding) example. Summer heat causes both increased ice cream consumption and increased swimming (hence more drowning). The correlation is real — but caused by a third variable, not by ice cream causing drowning. Association never establishes causation: r proves ρ ≠ 0 in the population; it reveals nothing about mechanism or direction of causation.

Scenario: “A nutritionist has data on daily caloric intake (kcal) and resting metabolic rate (kcal/day) for 80 participants. She runs a chi-square test and reports (, ), concluding ‘caloric intake and metabolic rate are not independent.’”

What is the error in this analysis?
Show Solution

Caloric intake in kcal and resting metabolic rate in kcal/day are both continuous quantitative measurements. Chi-square requires two qualitative (categorical) variables. The nutritionist has likely binned the data into artificial categories to run chi-square — a practice that discards information and introduces arbitrary cut-points. Pearson correlation is the correct method.

Scenario: “A researcher runs a chi-square test on a 2×2 table with and reports (, < 0.001). They conclude: ‘This enormous chi-square value confirms that the two variables have a very strong association.’”

What is the error in this conclusion?
Show Solution

(approximately). With , even (negligible association) gives . Always compute Cramér’s V to assess strength independently of . Here — a negligible effect, not a “very strong association.”

Scenario: “A tech company analyzes data from employees and finds that the correlation between work-from-home days per week and productivity score is (). The VP concludes: ‘This significant correlation proves that working from home substantially boosts productivity.’”

What is the error in this conclusion?
Show Solution

With , the critical is approximately 1.960. , well above 1.960 — statistically significant. But : work-from-home explains 0.25% of variance in productivity. The VP is confusing the presence of an effect with the importance of an effect. “Substantially boosts” requires a large practical effect, which r² = 0.0025 definitively contradicts.


Problem 4 — Communication Task


Problem 5 — Synthesis: Diabetes and Health Outcomes

A public health research team investigates two questions about diabetes in a random sample of 500 adults.

Question A: Is fasting blood glucose (mg/dL) related to BMI (kg/m²)? Given: , ,

Question B: Is diabetes diagnosis (Non-diabetic / Pre-diabetic / Diabetic) related to physical activity level (Active / Sedentary)?

Observed contingency table:

ActiveSedentaryRow total
Non-diabetic16585250
Pre-diabetic5595150
Diabetic3070100
Col total250250500

(a) For Question A, justify why Pearson correlation is the appropriate method.

(b) For Question A, compute (given , ) and interpret .

(c) For Question B, justify why chi-square is appropriate.

(d) For Question B, verify all conditions, compute , state the decision (), and compute Cramér’s V.

(e) A journalist reports: “The study proves that being sedentary causes diabetes.” Identify two statistical reasoning errors.

(f) A hospital administrator says: ” is small — BMI is useless for predicting blood glucose.” Evaluate this claim.

Show Solution

(a) Both variables are quantitative: fasting blood glucose in mg/dL and BMI in kg/m² are continuous numerical measurements where arithmetic (differences, averages) is meaningful. Chi-square requires two qualitative (categorical) variables — it is not applicable here.

(b) . Since , we reject H₀ ( < 0.001). — BMI explains 33.6% of the variance in fasting blood glucose. This is a moderate-to-strong practical effect ( is moderate, approaching strong at ).

(c) Both variables are qualitative: diabetes diagnosis has three named categories (Non-diabetic / Pre-diabetic / Diabetic); physical activity level has two named categories (Active / Sedentary). No arithmetic is meaningful on these category names. Two qualitative variables → chi-square is appropriate.

(d) Expected frequencies: ; ; ; ; ; . All ✓. , . . Since , reject H₀ ( < 0.001). — medium effect.

(e) Error 1 — Causation: chi-square tests association, not causation. This is observational data; we cannot conclude that being sedentary causes diabetes. Error 2 — Reverse causation: people with diabetes or pre-diabetes may already be sedentary due to their condition, not the other way around. The data cannot distinguish the direction of any causal relationship.

(f) The administrator is wrong to call it “useless.” is a moderate-to-strong effect — BMI explains 33.6% of fasting glucose variance. In medical research involving complex physiological systems, a single variable explaining one-third of the variance is clinically meaningful. The administrator may be applying an overly strict standard, or confusing individual predictive precision (which is affected by all unexplained variance) with the relevance of BMI as a risk indicator. No single measurement explains everything in health research — the 66.4% unexplained variance reflects the complexity of blood glucose regulation, not a failure of BMI.


Mixed Review — Retrieval from Earlier Lessons

These problems draw on concepts from earlier in the course. Attempting them without re-reading prior lessons is the point — retrieval practice strengthens long-term memory more than re-reading.

Review Problem 1 — Regression Prediction and Extrapolation (REG-3)

A civil engineer models the relationship between the age of residential water pipes (, years) and average lead concentration in tap water (, μg/L). From data on 30 city blocks she fits:

(a) Interpret the slope in context.

(b) Predict lead concentration for a block with pipes aged 35 years. Is this interpolation or extrapolation?

(c) A city planner asks for a prediction at years (pipes scheduled for replacement). Explain why this prediction is unreliable.

(d) A journalist writes: “Older pipes cause higher lead levels.” Evaluate this claim statistically.

Show Solution

(a) For each additional year of pipe age, the predicted average lead concentration in tap water increases by 0.45 μg/L, on average.

The slope is positive: older pipes are associated with higher lead concentrations. This aligns with the physical mechanism of pipe corrosion, though the regression alone cannot establish causation.

(b) μg/L

years falls within the observed range , so this is interpolation — a relatively safe prediction. (It is near the upper end, so some caution is warranted, but it does not extend beyond the data.)

(c) years is well beyond the maximum observed value of 40 years — this is extrapolation. Two reasons this prediction is unreliable:

  1. Linearity may not hold. Beyond 40 years, pipe degradation may accelerate non-linearly, or pipes may have been partially replaced/relined, breaking the linear trend.
  2. No supporting data. The model was fit only on blocks with pipes aged 5–40 years. We have no empirical information about the relationship at — we would be projecting the trend into uncharted territory.

The model gives μg/L, but this value should not be reported as a reliable prediction.

(d) The regression is consistent with a positive association, but “older pipes cause higher lead levels” requires caution:

  • The study is observational (no random assignment of pipe ages). Confounders such as neighbourhood age, water acidity, or pipe material type (not all old pipes are lead pipes) could explain the association.
  • The causal mechanism is physically plausible, but the regression alone cannot distinguish correlation from causation.

A more precise statement: “There is a statistically significant positive linear association (, ) between pipe age and lead concentration — the data are consistent with, but do not prove, a causal relationship.”


Review Problem 2 — Chi-Square Test of Independence (REG-4)

An occupational health researcher collects data on 200 office workers and records their seating arrangement (open-plan vs. private office) and reported productivity level (Low, Medium, High). The observed contingency table is:

LowMediumHighRow Total
Open-plan405030120
Private20402080
Col Total609050200

(a) Compute the expected frequency for the Open-plan / Medium cell.

(b) Verify the conditions for a chi-square test of independence.

(c) State and for this test.

(d) The computed test statistic is , , . State the decision and conclusion in context.

(e) A manager concludes: “Open-plan offices therefore do not affect productivity.” Identify the statistical reasoning error.

Show Solution

(a)

(b) Computing all expected frequencies:

LowMediumHigh
Open-plan
Private

All six expected frequencies are ≥ 5 ✓. The condition for chi-square is satisfied.

(c)

(d) Since , we fail to reject ().

Conclusion: “There is insufficient evidence at the 0.05 significance level that seating arrangement and productivity level are associated in this population of office workers.”

(e) The manager’s error is confusing “fail to reject ” with “accept .” A non-significant result does not prove the variables are independent — it only means the data did not provide sufficient evidence of dependence at this sample size. The true association may exist but be too small to detect with , or the test may lack sufficient power. The manager should say “we found no significant evidence of an association” rather than “there is no effect.”

Section 7: Mastery Check

Question 1 — Feynman Test

Explain in plain language — as if to a classmate who has not taken statistics — why a statistically significant result is not always practically important. Use a specific example in your explanation.

0 / 500
Show Model Answer

A statistically significant result means the data are unlikely to have occurred by chance if there were truly no relationship in the population — we have reliable evidence that an effect exists. But “exists” and “matters” are different things.

Imagine you study 10,000 employees and find that working from home one extra day per week is correlated with a small increase in productivity. With such a huge sample, even a tiny effect — say, — will be statistically significant. But : work-from-home explains 0.25% of the variation in productivity. The remaining 99.75% of productivity differences come from other sources entirely. The effect is real, but it is so small that no rational company would restructure their entire workplace policy based on it.

Statistical significance tells you the signal is reliably non-zero. Effect size ( or Cramér’s V) tells you how big the signal is. Both pieces of information are needed before drawing conclusions about importance.


Question 2 — Apply Question: A New Educational Study

A school board is considering a new reading program. A researcher collects data from students and finds that students who completed the program have significantly higher reading scores than those who did not. The analysis reports , , , .

(a) What type of test was likely run, and were both variables quantitative?

(b) The school board sees p = 0.002 and says “Proven — the program works.” What is wrong with this statement?

Show Solution

(a) The researcher appears to have treated program completion (yes/no) as a numerical variable (likely coded 0/1) and computed Pearson r. This is technically a point-biserial correlation — valid in some contexts, but program completion is fundamentally qualitative (binary categorical). A better analysis for “do the two groups differ?” would be an independent-samples t-test comparing mean reading scores across the two groups.

(b) Two errors: (1) p = 0.002 does not “prove” the program works — it provides evidence against H₀ (no relationship). Statistics does not prove; it provides evidence. (2) r² = 0.123 is in the weak range (0.04–0.15): the program explains only 12.3% of the variance in reading scores. This may not justify the cost of a new program — other factors account for 87.7% of score variation.


Question 3 — Error Analysis

A researcher’s published conclusion: “Our chi-square test found a significant association between hours of sleep per night and academic performance score (, , ). We conclude that insufficient sleep causes poor academic performance.”

Identify the primary error in this conclusion.

Show Solution

Error 1 — Wrong method: Hours of sleep per night is a continuous quantitative variable (3.5 hours, 7.0 hours, 8.5 hours — arithmetic differences are meaningful). Academic performance score is likely also quantitative. Chi-square requires two qualitative (categorical) variables. The researcher apparently binned the data into categories to run chi-square — this discards information and produces arbitrary results depending on the binning choice. Pearson correlation is the appropriate method.

Error 2 — Causal language: “Causes poor academic performance” is not justified by any bivariate analysis, including correlation. Even if correlation were correctly applied, it would show association only. Causal claims require controlled experimental design with random assignment. From observational data, we can only say “is associated with.”


Self-Assessment

How confident do you feel about this lesson’s core skills?

Still confusedReady for the Boss Fight

Section 8: Boss Fight

Choose your path. Both paths test everything from this lesson — pick the one that matches your strengths.

🔬 Path A — The Analyst

You receive a real dataset and must perform a complete bivariate analysis from scratch: classify variables, choose the method, compute the test statistic, interpret the effect size, and write a research conclusion.

📝 Path B — The Communicator

You receive three flawed research reports. For each one, you must identify the specific error and rewrite the conclusion using correct statistical language.

Path A — The Analyst

A school board wants to know if reading speed (words/minute) is related to reading comprehension score (out of 100) for students. Given: , .


Task 1. Classify both variables and justify your choice of analysis method.

Reading speed in words/minute and comprehension score out of 100 are both _____ variables because ___________. The appropriate method is _________.

Show Answer

Both are quantitative — they are numerical measurements where arithmetic differences are meaningful. Reading speed in wpm and comprehension score on a 0–100 scale both have meaningful averages and differences. → Pearson correlation / linear regression is appropriate.


Task 2. Compute the test statistic. Given , . Make the decision at α = 0.05.

Show Answer

Reject H₀ ( < 0.001).


Task 3. Interpret . Is this a weak, moderate, or strong practical effect? Justify using the thresholds from Section 3.

Show Answer

Strong practical significance. Reading speed explains 43.6% of the variance in reading comprehension score — a strong practical effect. This suggests reading speed is an important predictor of comprehension, though 56.4% of variance remains unexplained by speed alone.


Task 4. Write a two-sentence research conclusion using correct association language. Include: the direction, , , the decision, and the effect size classification. Do not use causal language.

0 / 500
Show Model Answer

“There is sufficient evidence of a positive linear relationship between reading speed and reading comprehension score in the population (, , < 0.001). Reading speed explains 43.6% of the variance in comprehension scores (), indicating a strong practical association — but this is observational data, and we cannot conclude that faster reading causes higher comprehension.”

Reflection: What was the most challenging part of this analysis? Was it the initial setup or the final interpretation?

Path B — The Communicator

You are reviewing three research summaries submitted to a school newsletter. Each has a problem. Identify the specific error in each report and write a corrected conclusion.


Report 1 — Nutrition Study:

“We measured diet quality score (0–100 scale) and athletic performance score (0–100 scale) for 60 student athletes. We found , < 0.001. Conclusion: A healthy diet causes significantly improved athletic performance. We recommend all athletes immediately overhaul their diets.”

Task 1. Identify the specific error and write a corrected two-sentence conclusion.

Show Answer

Error: “Causes” is a causal claim — unjustified from correlation data, which shows association only. Additionally, “immediately overhaul” overstates what a single observational study supports.

Corrected conclusion: “There is sufficient evidence of a positive linear relationship between diet quality score and athletic performance score in the population (, < 0.001, , moderate-to-strong practical effect). These results are from observational data and show an association — they do not establish that diet causes improved performance. Other factors (training hours, sleep, genetics) may contribute to the relationship.”


Report 2 — Social Media Study:

“We surveyed 500 students and found a statistically significant association between social media use (hours/week) and GPA. , . Conclusion: The research conclusively demonstrates a moderate relationship between social media and academic performance. Schools should limit social media access immediately.”

Task 2. Identify the specific error(s) and write a corrected two-sentence conclusion.

Show Answer

Errors: (1) “Moderate relationship” is wrong — (under 2% of variance explained) is negligible, not moderate. was likely confused with . (2) “Conclusively demonstrates” overstates what a p-value shows. (3) The policy recommendation ignores effect size.

Corrected conclusion: “There is sufficient evidence of a positive linear relationship between social media hours and GPA in the population (, ) — however, indicates that social media use explains less than 2% of variance in GPA. This is a statistically significant but negligible practical effect; the data do not support restricting social media access based on this association alone.”


Report 3 — Brand Preference Study:

“We surveyed 300 shoppers on preferred clothing brand (Brand X / Brand Y / Brand Z) and age group (18–25 / 26–40 / 41+). We computed Pearson , . Conclusion: There is sufficient evidence of a moderate positive linear relationship between clothing brand preference and age group.”

Task 3. Identify the specific error and write a corrected analysis description.

Show Answer

Error: Both variables are qualitative (categorical) — brand names (Brand X/Y/Z) and age groups (18–25/26–40/41+) are named categories, not quantitative measurements. Pearson correlation requires two quantitative variables. The correct method is chi-square test of independence.

Corrected description: “Both preferred clothing brand and age group are qualitative variables — named categories with no meaningful arithmetic. Pearson correlation is not appropriate. A chi-square test of independence should be run on the contingency table. The reported is meaningless in this context. The research team should reanalyze using chi-square and report , , , and Cramér’s V.”

Reflection: What was the most challenging part of this analysis? How would you apply this design approach to another problem?

Section 9: Challenge Problems

Ready for more? These go beyond the lesson objectives.

Challenge 1 — The ANOVA Gap: Mixed Variable Types

A researcher surveys 100 university students on their political affiliation (Liberal / Conservative / Independent) and their self-reported happiness on a 1–10 scale. They want to test whether political affiliation is related to happiness.

(a) Can chi-square be applied? Explain.

(b) Can Pearson correlation be applied? Explain.

(c) What analysis approach would be appropriate? (Name it, even though it is beyond this course.)

(d) What general principle does this illustrate about method selection?

Show Solution

(a) No — happiness on a 1–10 scale is quantitative (numerical ratings where arithmetic differences are meaningful). Chi-square requires both variables to be qualitative (categorical). Applying chi-square here would require artificially binning the happiness scores, discarding information.

(b) No — political affiliation (Liberal / Conservative / Independent) is qualitative (nominal categories). There is no numerical meaning to political affiliation labels; you cannot meaningfully compute the average of “Liberal + Conservative.” Pearson correlation requires both variables to be quantitative.

(c) The appropriate approach is a group-comparison method: specifically, one-way ANOVA (Analysis of Variance) if normality and equal-variance assumptions are met, or the Kruskal-Wallis test (non-parametric alternative). The research question becomes: “Does mean happiness differ across the three political affiliation groups?” This compares group means on the quantitative variable, rather than testing association between two variables of the same type.

(d) Always identify both variable types before choosing a method. When one variable is qualitative and one is quantitative, neither chi-square nor Pearson correlation applies. A group-comparison method is needed — or the question must be reframed. Method selection is driven entirely by variable types, not by the research topic.


Challenge 2 — Sample Size and the p-value Trap

A nutritionist runs two separate studies on the relationship between diet quality score (quantitative) and inflammatory marker CRP in mg/L (quantitative).

Study A: , ,

Study B: , ,

Pre-verified computations:

(a) Are both studies statistically significant at α = 0.05?

(b) For which study is there stronger practical evidence that diet quality is meaningfully related to CRP?

(c) What does this illustrate about using p-values alone to compare studies?

(d) If a nutritionist wanted to recommend diet coaching based on these results, which study better supports the recommendation? Why?

Show Solution

(a) Yes — both studies reject H₀ at α = 0.05 (|t| > t* in both cases).

(b) Study A provides stronger practical evidence: (moderate — diet quality explains 27% of variance in CRP) versus Study B’s (negligible — only 3.2% explained). Despite Study B having a smaller p-value, its practical effect is much weaker.

(c) p-values are heavily influenced by sample size. With , even negligible associations produce significant t-statistics. Two studies can have nearly identical p-values (0.019 vs. 0.011) while having wildly different practical significance ( vs. ). Never use p-values to compare study importance — always report and compare effect sizes.

(d) Study A — indicates diet quality explains a meaningful portion of CRP variance. This is a moderate practical effect with clinical plausibility. Study B’s (explaining only 3.2% of variance in CRP with ) provides very weak basis for individual-level interventions like diet coaching. A nutritionist should always examine both statistical and practical significance before making recommendations.


Challenge 3 — Write a Full Research Report

Using the results from Section 6 Problem 5 (the diabetes study):

Write a two-paragraph research report following the C10 checklist from Section 3:

  1. Paragraph 1: Question A (BMI vs. fasting glucose) — method, , , , , -value, decision, conclusion, effect size, study limitation
  2. Paragraph 2: Question B (diabetes status vs. activity level) — method, , , V, -value, decision, conclusion, V interpretation, study limitation
  3. Both paragraphs: association language only (no causal claims), note observational design
Show Model Answer

Paragraph 1: A Pearson correlation was computed to assess the linear relationship between BMI (kg/m²) and fasting blood glucose (mg/dL) in a random sample of 500 adults. The analysis revealed a significant positive relationship, , , < 0.001. BMI explains 33.6% of the variance in fasting blood glucose (), indicating a moderate-to-strong practical association. Adults with higher BMI tend to show higher fasting blood glucose in this sample. Because this is an observational study, causal conclusions cannot be drawn; other factors (diet, physical activity, genetics) likely contribute to blood glucose levels.

Paragraph 2: A chi-square test of independence was used to determine whether diabetes diagnosis (Non-diabetic / Pre-diabetic / Diabetic) and physical activity level (Active / Sedentary) are independent in the population of adults. The test was significant, , < 0.001. Cramér’s indicates a medium association — sedentary adults are proportionally more represented among pre-diabetic and diabetic groups. Because this is observational data, the direction of any causal relationship — whether inactivity is associated with diabetes risk, or whether health declines from diabetes reduce physical activity — cannot be determined from these analyses alone.

Section 10: Solutions Reference

View Full Solutions Page

The solutions page contains fully worked solutions for all Sections 5–9 problems, including all variant banks and synthesis problems.


Quick-Reference Formulas

Test for H₀: ρ = 0 (Pearson correlation)

Chi-square test of independence

Effect sizes


Method Selection Summary

Variable 1Variable 2Method
QuantitativeQuantitativePearson correlation / linear regression
QualitativeQualitativeChi-square test of independence
QuantitativeQualitativeGroup comparison (ANOVA) — beyond this course
QualitativeQuantitativeGroup comparison (ANOVA) — beyond this course

Effect Size Thresholds

Practical Significance
< 0.04Negligible
0.04 – 0.15Weak
0.15 – 0.35Moderate
≥ 0.35Strong
Cramér’s VAssociation Strength
< 0.1Negligible
0.1 – 0.3Small
0.3 – 0.5Medium
≥ 0.5Large

Conclusion Language Templates

Correlation (reject H₀): “There is sufficient evidence of a [positive/negative] linear relationship between [X] and [Y] in the population.”

Correlation (fail to reject H₀): “There is insufficient evidence of a linear relationship between [X] and [Y] in the population.”

Chi-square (reject H₀): “There is sufficient evidence that [Variable 1] and [Variable 2] are not independent in the population.”

Chi-square (fail to reject H₀): “There is insufficient evidence to conclude that [Variable 1] and [Variable 2] are not independent in the population.”