GP-1 — Computing \( b \), \( a \), and Identifying the Point of Means (Variants 0–4)
For all variants: \( b = r \times s_y / s_x \), then \( a = \bar{y} - b\bar{x} \). The point of means \( (\bar{x}, \bar{y}) \) always lies on the line.
Variant 0 — Study hours vs. Exam score: \( r = 0.86 \), \( s_x = 2.0 \), \( s_y = 8.6 \), \( \bar{x} = 3.0 \), \( \bar{y} = 68.0 \)
- Part A: \( b = 0.86 \times 8.6 / 2.0 = 0.86 \times 4.3 = 3.698 \approx 3.70 \)
- Part B: \( a = 68.0 - 3.70 \times 3.0 = 68.0 - 11.10 = 56.90 \)
- Part C: Point of means: \( (3.0,\; 68.0) \). Verify: \( \hat{y}(3.0) = 56.90 + 3.70 \times 3.0 = 56.90 + 11.10 = 68.0 \) ✓
Variant 1 — Temperature (°C) vs. Hot beverage sales: \( r = -0.72 \), \( s_x = 5.0 \), \( s_y = 15.0 \), \( \bar{x} = 20.0 \), \( \bar{y} = 80.0 \)
- Part A: \( b = -0.72 \times 15.0 / 5.0 = -0.72 \times 3.0 = -2.16 \)
- Part B: \( a = 80.0 - (-2.16)(20.0) = 80.0 + 43.2 = 123.2 \)
- Part C: Point of means: \( (20.0,\; 80.0) \). Verify: \( \hat{y}(20.0) = 123.2 - 2.16 \times 20.0 = 123.2 - 43.2 = 80.0 \) ✓
Variant 2 — Daily exercise (min) vs. Resting heart rate (bpm): \( r = -0.88 \), \( s_x = 10.0 \), \( s_y = 6.0 \), \( \bar{x} = 30.0 \), \( \bar{y} = 72.0 \)
- Part A: \( b = -0.88 \times 6.0 / 10.0 = -0.88 \times 0.6 = -0.528 \approx -0.53 \)
- Part B: \( a = 72.0 - (-0.53)(30.0) = 72.0 + 15.9 = 87.9 \)
- Part C: Point of means: \( (30.0,\; 72.0) \). Verify: \( \hat{y}(30.0) = 87.9 - 0.53 \times 30.0 = 87.9 - 15.9 = 72.0 \) ✓
Variant 3 — Fertilizer applied (g) vs. Tomato yield (kg): \( r = 0.75 \), \( s_x = 4.0 \), \( s_y = 2.4 \), \( \bar{x} = 8.0 \), \( \bar{y} = 5.2 \)
- Part A: \( b = 0.75 \times 2.4 / 4.0 = 0.75 \times 0.6 = 0.45 \)
- Part B: \( a = 5.2 - 0.45 \times 8.0 = 5.2 - 3.6 = 1.60 \)
- Part C: Point of means: \( (8.0,\; 5.2) \). Verify: \( \hat{y}(8.0) = 1.60 + 0.45 \times 8.0 = 1.60 + 3.60 = 5.20 \) ✓
Variant 4 — Age (years) vs. Reaction time (ms): \( r = 0.81 \), \( s_x = 12.0 \), \( s_y = 25.0 \), \( \bar{x} = 40.0 \), \( \bar{y} = 245.0 \)
- Part A: \( b = 0.81 \times 25.0 / 12.0 = 20.25 / 12.0 \approx 1.69 \)
- Part B: \( a = 245.0 - 1.69 \times 40.0 = 245.0 - 67.6 = 177.4 \)
- Part C: Point of means: \( (40.0,\; 245.0) \). Verify: \( \hat{y}(40.0) = 177.4 + 1.69 \times 40.0 = 177.4 + 67.6 = 245.0 \) ✓
Common mistakes in GP-1: (1) Inverting the ratio — always \( s_y \) in the numerator: \( b = r \times s_y / s_x \). (2) Wrong sign on the correction: when \( b \) is negative, \( -b\bar{x} \) is positive. (3) Forgetting to subtract \( b\bar{x} \) from \( \bar{y} \).
GP-2 — Interpreting Slope and Intercept (Variants 0–4)
Variant 0 — \( \hat{y} = 56.90 + 3.70x \) (study hours → exam score)
- Part A (Slope): "For each additional hour of study, the predicted exam score increases by 3.70 points, on average." The phrase "on average" is mandatory — the slope describes the average predicted change, not a guarantee for any individual student.
- Part B (Intercept): Yes — \( x = 0 \) hours of study is a realistic boundary case (a student who did not study at all). The intercept 56.90 has borderline contextual meaning: it is the predicted score for someone who studied 0 hours.
Variant 1 — \( \hat{y} = 123.2 - 2.16x \) (temperature °C → hot beverage sales)
- Part A (Slope): "For each additional °C of temperature, the predicted hot beverage sales decrease by 2.16 units, on average."
- Part B (Intercept): Yes — \( x = 0 \)°C (freezing point) is a realistic winter temperature within the plausible range. The intercept 123.2 units is the predicted sales at 0°C and is contextually meaningful.
Variant 2 — \( \hat{y} = 87.9 - 0.53x \) (daily exercise min → resting heart rate bpm)
- Part A (Slope): "For each additional minute of daily exercise, the predicted resting heart rate decreases by 0.53 bpm, on average."
- Part B (Intercept): Yes — \( x = 0 \) minutes of exercise (a completely sedentary person) is a realistic value. The intercept 87.9 bpm represents the predicted resting heart rate for someone with no daily exercise.
Variant 3 — \( \hat{y} = 1.60 + 0.45x \) (fertilizer g → tomato yield kg)
- Part A (Slope): "For each additional gram of fertilizer, the predicted tomato yield increases by 0.45 kg, on average."
- Part B (Intercept): Yes — \( x = 0 \) g (a control plot with no fertilizer) is entirely realistic in agricultural experiments. The intercept 1.60 kg represents the predicted yield with no fertilizer applied.
Variant 4 — \( \hat{y} = 177.4 + 1.69x \) (age years, observed 20–65 → reaction time ms)
- Part A (Slope): "For each additional year of age, the predicted reaction time increases by 1.69 ms, on average."
- Part B (Intercept): No — \( x = 0 \) years (a newborn) is far outside the observed data range of 20–65 years. The intercept 177.4 ms is a mathematical anchor that positions the line; it should not be interpreted as a meaningful prediction of a newborn's reaction time.
GP-3 — Residual Scenarios (Non-regenerable)
Scenario 1: \( \hat{y} = 56.90 + 3.70x \). A student studies 4 hours and scores 73.
\[ \hat{y}(4) = 56.90 + 3.70 \times 4 = 56.90 + 14.80 = 71.70 \]
\[ e = y - \hat{y} = 73 - 71.70 = +1.30 \]
A positive residual means the student's actual score (73) is above the regression line — the model underpredicted by 1.30 points.
Scenario 2: \( \hat{y} = 123.2 - 2.16x \). At 25°C, actual sales were 68 units.
\[ \hat{y}(25) = 123.2 - 2.16 \times 25 = 123.2 - 54.0 = 69.2 \]
\[ e = 68 - 69.2 = -1.2 \]
A negative residual means actual sales (68) fell below the predicted value (69.2) — the model overpredicted by 1.2 units.
Scenario 3: \( \hat{y} = 87.9 - 0.53x \). A person exercises 60 minutes and has resting HR 56 bpm.
\[ \hat{y}(60) = 87.9 - 0.53 \times 60 = 87.9 - 31.8 = 56.1 \]
\[ e = 56 - 56.1 = -0.1 \]
A tiny negative residual: this person's heart rate (56 bpm) is just barely below the line (56.1 bpm). The model overpredicted by 0.1 bpm — essentially exactly on the line.
GP-4 — Parameterized Generator (Representative Example)
Representative example — your generator will show different values.
A sports analyst finds \( r = 0.86 \), \( s_x = 2.0 \), \( s_y = 8.6 \), \( \bar{x} = 3.0 \), \( \bar{y} = 68.0 \). Target \( x = 5 \).
Step 1 — Compute \( b \):
\[ b = r \times \frac{s_y}{s_x} = 0.86 \times \frac{8.6}{2.0} = 0.86 \times 4.3 = 3.698 \approx 3.70 \]
Step 2 — Compute \( a \):
\[ a = \bar{y} - b\bar{x} = 68.0 - 3.70 \times 3.0 = 68.0 - 11.10 = 56.90 \]
Step 3 — Predict \( \hat{y} \) at \( x = 5 \):
\[ \hat{y}(5) = 56.90 + 3.70 \times 5 = 56.90 + 18.50 = 75.40 \]
Verify via point of means: \( \hat{y}(3.0) = 56.90 + 3.70 \times 3.0 = 68.0 = \bar{y} \) ✓