Data Visualization

In April 2020, the Georgia Department of Public Health posted a bar chart showing COVID-19 cases by county over 15 days. The counties weren’t in chronological order on the x-axis — they were sorted so the bars would appear to go down, giving the visual impression that cases were falling when they weren’t. Public outcry forced the department to repost the chart with the dates in correct order. The bars now visibly climbed.

Same data. Two very different stories.

This is what visualization literacy is really about: not just making a graph, but understanding what a graph communicates — and what it can hide. Every design choice (what goes on each axis, where the axis starts, how wide the bars are, whether to use a pie chart or a histogram) changes the story the viewer perceives. Make the wrong choice by accident and you mislead your audience. Make it deliberately and you deceive them.

By the end of this lesson you’ll have the tools to both create honest, informative graphs and read suspicious ones critically.

After this lesson, you will be able to:

Build a frequency distribution table showing absolute, relative, and cumulative frequencies
Select the correct graph type for a given variable — histogram for quantitative data, bar chart for qualitative data
Construct and interpret histograms, bar charts, stem-and-leaf plots, scatter plots, and time-series graphs
Identify the techniques used to make graphs misleading — truncated axes, inconsistent scales, distorted areas — and explain why they deceive

These skills matter well beyond this course. Every field — medicine, economics, journalism, engineering, social science — presents data graphically. Being able to read and evaluate those graphs is a form of literacy as important as reading text.

Building accurate visual representations requires a clear understanding of the data types you identified in DS-1.

From DS-1: Qualitative vs. Quantitative. Qualitative data are labels/categories; quantitative data are numerical measurements. (Histograms only apply to quantitative data.)
From DS-1: Discrete vs. Continuous. Discrete values are countable (1, 2, 3); continuous values can take any value in an interval (like 1.75 kg).
Frequency vs. Relative Frequency: Frequency is the raw count; relative frequency is the proportion (count / total). Both are used to scale the vertical axis of your charts.
Sorting and Classing: Grouping individual data points into “bins” (e.g., ages 10–19, 20–29) is the first step in creating any distribution plot.

Retrieval Checkpoint

A dataset contains the heights (in cm) of 50 students. You want to visualize the distribution of these heights using a chart. Which category does this data belong to?

Success Factor:

Visual Guardrail:

If you aren’t sure whether to use a Bar Chart or a Histogram, check the data type. Bar charts are for categories (qualitative); histograms are for numerical ranges (quantitative). Using the wrong one is a “Level 1” error in data visualization.

Retrieval Warm-up — from DS-1

A researcher surveys 500 people and records the following for each participant: (i) their preferred streaming platform (Netflix, Crave, Disney+, Other), and (ii) their monthly screen time in hours. Which of the following correctly classifies these two variables?

A study reports that the mean resting heart rate for a sample of 80 marathon runners is beats per minute, while the population mean for all adults is bpm. Which statement uses these symbols correctly?

C1 — Frequency Distributions

Before drawing any graph, we organize raw data into a frequency distribution — a table that tallies how often each value (or range of values) appears in the dataset. It’s the scaffolding behind almost every graph you’ll encounter.

Frequency Distribution

A frequency distribution is a table that summarizes data by recording:

Class (or value): the category or interval
Absolute frequency : the raw count of observations in that class
Relative frequency : the proportion of the total in that class, computed as

Cumulative frequency : the running total of absolute frequencies up to and including that class
Cumulative relative frequency : the running total of relative frequencies — equivalently, the proportion of observations at or below that class

Here’s what a frequency distribution looks like for a dataset of 20 exam scores:

Class (Score)	Absolute Freq.	Relative Freq.	Cumulative Freq.	Cumul. Rel. Freq.
50–59	2	0.10	2	0.10
60–69	5	0.25	7	0.35
70–79	8	0.40	15	0.75
80–89	4	0.20	19	0.95
90–99	1	0.05	20	1.00
Total	20	1.00	—	—

Read it like this: 8 students scored in the 70–79 range (absolute frequency). That’s 40% of the class (relative frequency). And 75% of students scored 79 or below (cumulative relative frequency at the 70–79 class).

Cumulative relative frequency ≠ relative frequency. The relative frequency for the 70–79 class is 0.40 (40% scored in that range). The cumulative relative frequency is 0.75 (75% scored 79 or below — including everyone in earlier classes too). These are completely different quantities. The cumulative column always ends at 1.00 (100%).

Figure 1b: What “cumulative” means. Left: the five classes as separate bars — ordinary frequency. Right: the same classes stacked into one column (colours match). Drag the slider to add classes one at a time and watch the running total climb: the left axis reads it as a count () and the right axis as a proportion (). The stack always tops out at , — the cumulative scale ends at 100%.

C2 — Histograms (for Quantitative Data)

A histogram turns a frequency distribution into a picture. Each class becomes a bar; the bar’s height represents frequency. But one feature distinguishes histograms from all other bar-based graphs: the bars touch.

Histogram

A histogram is a graph of a frequency distribution for quantitative (numerical) data. Each bar represents one class interval:

Horizontal axis (x): the numerical values — class boundaries are marked at the edges of each bar
Vertical axis (y): frequency (absolute or relative)
Bars touch — because the x-axis is a continuous numerical scale; there are no gaps between classes
Each bar’s area is proportional to its frequency (when class widths are equal, height alone is sufficient)

Here is the histogram for the exam score data above. Notice how the bars share their edges — the right edge of the 60–69 bar is the same as the left edge of the 70–79 bar.

Figure 1: Histogram of 20 exam scores (class width = 10). Bars touch — the x-axis is continuous.

Histogram shapes tell a story. Where does the data cluster? Are there a few unusually high or low values that extend one side? Practise describing what you see: “most values fall between X and Y, with a few as high as Z.” You will give these patterns formal names and measures in DS-5 (Position and Distribution Shape).

Histogram intervals are not discrete categories. The 70–79 bar represents every score from 70 up to (but not including) 80. There’s no “70 category” and “79 category” — the x-axis is a continuous number line divided into intervals. Reading the bars as isolated discrete bins is a misinterpretation; the bars represent ranges on a continuum.

Try it: type your own numbers and drag the Number of classes slider. Watch the frequency table (, , , ) and the histogram rebuild together — every bar height is a row in the table. Notice how the same data can look smooth with few classes and jagged with many: that is the bin-width effect you’ll revisit in the Challenge Problems.

C3 — Bar Charts (for Qualitative Data)

When the data is categorical rather than numerical, we use a bar chart. Bar charts look similar to histograms at a glance, but there’s one crucial difference that isn’t merely cosmetic — it reflects the nature of the data itself.

Bar Chart

A bar chart displays the frequency or relative frequency of each category in a qualitative variable. Key features:

Horizontal axis (x): category labels
Vertical axis (y): frequency or relative frequency
Bars do NOT touch — there is a gap between bars because categories are distinct (there is no value “between” Bakery and Dairy)
Bar order is arbitrary for nominal data; use the natural order for ordinal data
A bar chart sorted in descending order of frequency — most common category first — is called a Pareto chart, widely used in quality control to quickly identify the largest sources of defects

Figure 2: Bar chart of favourite sports (qualitative nominal data). Bars have gaps — the categories are distinct.

The Key Difference — One Rule to Remember

Histogram bars touch. Bar chart bars don’t.

This isn’t an aesthetic choice. In a histogram, the x-axis is a continuous number line — scores of 69.9 and 70.0 are adjacent, so the bars share an edge. In a bar chart, the x-axis shows distinct categories — “Soccer” and “Basketball” are not adjacent on any meaningful scale, so there’s a gap to reflect that separation.

Use a histogram for quantitative data. Use a bar chart for qualitative data. Using a histogram for categorical data is always wrong.

C4 — Other Graph Types

Histograms and bar charts cover most of what you’ll build in this course. But four other graph types appear regularly in statistics — each exists because a particular question about data cannot be answered with a bar. Here’s what each one is for.

Pie Chart

A pie chart divides a circle into sectors, where each sector’s angle is proportional to its relative frequency (). Best used when:

Data is qualitative (usually nominal)
There are 5 or fewer categories (more segments make the chart unreadable)
Part-to-whole relationships are the main message

Limitation: Humans are poor at estimating angles. When categories are similar in size, a bar chart is usually clearer. Many statisticians prefer bar charts to pie charts almost universally.

Pie chart

Bar chart — same data

Figure 2b: Same data, two graph types. The pie chart encodes frequency as sector angle — difficult to compare when sectors are similar in size (try judging “Basketball vs. Swimming” precisely). The bar chart encodes frequency as bar height — immediately comparable. For most purposes, prefer the bar chart.

Stem-and-Leaf Plot

A stem-and-leaf plot displays quantitative data by splitting each value into a stem (leading digit(s)) and a leaf (trailing digit). It looks like a sideways histogram but preserves the individual data values:

Stem | Leaf
  5  | 3 7
  6  | 1 4 4 8 9
  7  | 0 2 3 5 5 7 8
  8  | 1 6 9
  9  | 4

Best for small datasets (n ≤ 50) where you want to see the distribution and keep the raw data visible. Each row is a class; leaves within a row are individual observations.

Scatter Plot

A scatter plot displays pairs of quantitative measurements as points on a two-dimensional graph. Each point represents one observation; its x-coordinate is one variable and its y-coordinate is another.

Use a scatter plot when you want to explore the relationship between two quantitative variables — does one tend to increase as the other does? Is there a linear pattern? Are there outliers? You’ll use scatter plots extensively in REG-1 (Correlation Analysis).

Figure 2c: Scatter plot of study hours versus exam score (n = 15). Each dot is one student. The upward trend — more hours, higher scores — is visible as a whole-cloud pattern, not a rule for any individual point. This is what “relationship between two quantitative variables” looks like. You will quantify this trend precisely in REG-1.

Time-Series Plot

A time-series plot (line graph) shows how a quantitative variable changes over time. Time goes on the x-axis; the variable of interest on the y-axis. Points are connected by lines to emphasize the change from one time period to the next.

Use a time-series plot when the x-variable specifically represents time (days, months, years) and the trend over time is the story you want to tell.

Figure 2d: Time-series plot of monthly average temperature over one year. The connected line emphasizes the trend — rise through summer, fall through winter. Points are connected because the x-axis is time: consecutive months are adjacent, so tracking the change from one to the next is meaningful. Compare this to a bar chart of the same data: a bar chart would show the same heights but lose the sense of continuous progression.

C5 — Choosing the Right Graph

Every graph-choice decision starts with the same question: What type of variable is this? Use the decision chain below.

Graph Selection Decision Chain

One variable?
- Qualitative: bar chart (always); pie chart (only if ≤5 categories and part-to-whole is the focus)
- Quantitative: histogram (distribution shape); stem-and-leaf (small n, want raw values)
Two variables?
- Both quantitative: scatter plot; if one is time, use a time-series plot
- One qualitative, one quantitative: side-by-side bar charts or grouped histograms (covered in later courses)

Using a histogram for qualitative data is always wrong. A histogram’s x-axis is a continuous number line — it implies that values between bars are possible. For qualitative data (e.g., “Bakery,” “Dairy”), no such “between” exists. Use a bar chart. Similarly, using a bar chart for quantitative data with many distinct values (like a weight distribution) will produce a cluttered mess where a histogram would be clean and clear.

C6 — Graph Misrepresentation

Not all misleading graphs are accidents. Knowing the common techniques helps you spot them — and avoid creating them yourself.

Truncated Y-Axis

A truncated y-axis starts above zero, making small differences look large. The bars or line still reflect the true data values, but the visual impression is distorted.

When it’s a problem: bar charts and histograms should (almost) always start at 0. A bar that represents 97% vs. 94% disappears entirely when the y-axis starts at 0 — fine. But when the axis starts at 93%, that 3-point difference looks enormous.

When it’s acceptable: time-series plots often legitimately start above zero (e.g., tracking temperature variations around 15°C — starting at 0 would waste most of the chart). The key is whether the starting point is disclosed and whether the visual impression matches the magnitude of the difference.

Here is the same fictional data — approval ratings over four quarters — displayed two ways. The only difference between the two graphs is where the y-axis starts.

✓ Honest: y-axis starts at 0

The change looks small — as it should.

✗ Misleading: y-axis starts at 93

Looks like a collapse — same 3-point drop.

Figure 3: Same data, two y-axes. The left graph starts at 0%; the right graph starts at 93%. The 3% decline from Q1 to Q4 looks negligible on the left and catastrophic on the right. This is the truncated y-axis trick.

Figure 3b — drag the slider yourself. The data never changes: approval slips from 97% to 94%, a 3-point decline. As you raise where the y-axis starts, watch the red “visual drop” readout balloon from a few percent to “looks 75% shorter” — while the green “actual change” stays fixed at 3 points. The gap between those two numbers is the deception.

Other Common Misrepresentation Techniques

Inconsistent class widths in histograms: If classes are different widths, the bar height alone is misleading — a wider class captures more observations. The correct display uses frequency density (frequency ÷ class width) on the y-axis, ensuring area rather than height encodes frequency.
3D effects and distorted areas: 3D pie charts tilt and expand the slice nearest the viewer, making it look larger than its true proportion. Pictograms (where a doubled icon also doubles in width, quadrupling area) distort relative sizes.
Selective date ranges in time-series: Choosing a start date that captures only the upward part of a trend creates the impression of consistent growth when the full history shows volatility.
Omitting the sample size n: “67% of customers prefer our brand!” — based on a survey of 3 customers — is technically correct but meaningless without context.

Quick checklist for evaluating any graph you encounter:

Does the y-axis start at 0? If not, why not — and does the distortion matter?
Are all bars/slices drawn to a consistent scale?
Is the graph type appropriate for the variable type?
What is n? Is the sample large enough to support the claim?
Is the time range shown cherry-picked?

Concept Check

Before the worked examples, confirm the two distinctions students most often blur.

A researcher has recorded the eye colour (brown, blue, green, hazel) of 300 students and wants to display how the four colours are distributed. Which graph is appropriate, and why?

In a frequency table, the class 20–24 minutes has relative frequency and cumulative relative frequency . What does the 0.66 tell you?

Let’s walk through four examples — each exercises a different core concept, and the scaffolding gradually fades so you’re doing more of the thinking by Example 4.

Example 1 — Building a Frequency Distribution Table (C1)

Scenario: A quality-control inspector records the number of defective items found in each of 20 production batches:

3, 7, 2, 5, 8, 4, 6, 3, 5, 9, 1, 4, 6, 7, 5, 2, 8, 3, 6, 4

Organize this into a frequency distribution with 5 classes of width 2 (classes: 1–2, 3–4, 5–6, 7–8, 9–10). Compute the absolute frequency, relative frequency, cumulative frequency, and cumulative relative frequency.

Step 1: Tally each class.

Go through the data values one by one and mark which class each belongs to:

1–2: values 3, 7, 2, 5, 8, 4, 6, 3, 5, 9, 1, 4, 6, 7, 5, 2, 8, 3, 6, 4 → count = 2
3–4: 3, 7, 2, 5, 8, 4, 6, 3, 5, 9, 1, 4, 6, 7, 5, 2, 8, 3, 6, 4 → count = 6
5–6: 3, 7, 2, 5, 8, 4, 6, 3, 5, 9, 1, 4, 6, 7, 5, 2, 8, 3, 6, 4 → count = 7
7–8: 3, 7, 2, 5, 8, 4, 6, 3, 5, 9, 1, 4, 6, 7, 5, 2, 8, 3, 6, 4 → count = 4
9–10: 3, 7, 2, 5, 8, 4, 6, 3, 5, 9, 1, 4, 6, 7, 5, 2, 8, 3, 6, 4 → count = 1

Step 2: Compute relative frequency. Divide each count by .

Step 3: Compute cumulative frequencies. Running totals from top to bottom.

Result:

Class
1–2	2	0.10	2	0.10
3–4	6	0.30	8	0.40
5–6	7	0.35	15	0.75
7–8	4	0.20	19	0.95
9–10	1	0.05	20	1.00
Total	20	1.00	—	—

Interpretation: 75% of batches had 6 or fewer defects ( at the 5–6 class). Only 1 batch had 9–10 defects (5% of all batches).

Sanity checks: the column must sum to . The column must sum to 1.00. The final row of must equal . The final row of must equal 1.00. Run these checks every time.

Example 2 — Constructing a Histogram (C2)

Scenario: Using the frequency table from Example 1, sketch the histogram.

Before I show the steps, take a moment to predict: where do you expect the tallest bar to be? Will the bars be roughly equal on both sides of that peak, or will one side drop off more steeply?

Your prediction: Based on the frequency table, where do you expect the tallest bar to be? Will the bars be roughly the same height on both sides of the peak, or will one side drop off more steeply? Think it through, then continue.

Step 1: Draw and label the axes.

Horizontal axis: continuous scale from 0.5 to 10.5 (add half-unit padding so bars don’t cut against the axis ends); mark class boundaries at 0.5, 2.5, 4.5, 6.5, 8.5, 10.5
Vertical axis: frequency; mark from 0 to 8 (the maximum frequency)

Step 2: Draw bars for each class, touching at the class boundaries.

Figure 4: Histogram of defective items per batch (n = 20). The distribution peaks at the 5–6 class; bar heights decrease on both sides of the peak.

Shape: The peak is at the 5–6 class (f = 7). From the peak, bars decrease on both sides — dropping to f = 4 in the 7–8 class and f = 1 in the 9–10 class on the right, and to f = 6 in the 3–4 class and f = 2 in the 1–2 class on the left. The distribution is approximately balanced around its centre. In DS-5 you will give shapes like this formal names and measures; for now, practise describing what you observe: which class is tallest, and how do the bars change on each side of the peak.

Example 3 — Choosing the Right Graph (C3, C5)

For each scenario, which graph is most appropriate? Think through the decision chain (variable type → graph choice) before revealing the answer.

Scenario A: A researcher records the preferred music genre of 200 university students (Pop, Rock, Hip-hop, Classical, Other).

Show answer for Scenario A

Answer: Bar chart. Music genre is a qualitative nominal variable — categories with no natural order. A bar chart with one bar per genre and frequency on the y-axis is correct. A pie chart would also be acceptable here (5 categories fits the pie chart guideline), but a bar chart makes relative sizes easier to compare. Not a histogram — genre is not a number; there is no meaningful “between Pop and Rock.”

Scenario B: A nurse records the resting heart rate (beats per minute) of 50 patients.

Show answer for Scenario B

Answer: Histogram (or stem-and-leaf). Heart rate is quantitative continuous. A histogram groups the values into class intervals (e.g., 60–69, 70–79 bpm) and shows the distribution shape. For a small dataset (n = 50), a stem-and-leaf plot would also work and preserves individual values. Not a bar chart — the x-axis is a continuous number line.

Scenario C: An economist tracks the unemployment rate each month for 5 years.

Show answer for Scenario C

Answer: Time-series plot (line graph). The x-variable is time (months), and showing the trend over time is the point. Connecting the points with a line emphasizes the change from month to month. A histogram would destroy the temporal ordering of the data — it would show the distribution of rates but not how rates evolved.

Example 4 — Spot the Misleading Graph (C6)

This example tests whether you can recognize the misrepresentation techniques from C6 in a new context — not just describe them, but identify them when they appear.

Graph A: A bar chart shows monthly sales for two competing stores. Store A’s bar is drawn as a dollar-sign icon 3 cm tall. Store B’s bar is a dollar-sign icon 6 cm tall — twice as tall and twice as wide, representing $2 million vs. $1 million in sales.

Identify the misrepresentation in Graph A

Misrepresentation: Distorted area in a pictogram. Store B has twice the sales of Store A, so the icon is drawn twice as tall — but it was also made twice as wide! Area = height × width. A 2× taller and 2× wider icon has 4× the area of Store A’s icon, making the difference look four times as large as it actually is.

Fix: Use bars of equal width or use a simple bar chart without pictogram icons.

Graph B: A line graph shows a company’s stock price over 6 months. The y-axis starts at $82 and ends at $90. The line climbs steeply from $83 to $89. The headline reads: “Stock price surges — up 7.2% in six months!”

Identify the misrepresentation in Graph B

Misrepresentation: Truncated y-axis creating visual exaggeration. Starting the y-axis at $82 rather than $0 makes a 7.2% rise look like a near-vertical climb. On an axis from $0 to $90, the same data would appear as a very shallow upward slope.

Note: For a time-series plot of a stock, starting at $0 is sometimes impractical (the line would be nearly flat in an invisible region). The honest fix is to clearly label the axis start and avoid claiming the visual magnitude represents the magnitude of the change. The headline “surges” is the misleading part — 7.2% over 6 months is a moderate increase, not a surge.

Time to try it yourself — with support. Each problem below gives you immediate feedback. If you get something wrong, read the rationale to understand why before moving on.

Problem 1 — Completing a Frequency Distribution Table (C1)

Work through one commute-time table one step at a time — relative frequency, then cumulative frequency, then a tail percentage. Each step is checked before the next unlocks, so you can see exactly which move (if any) trips you up. Click “Try a similar problem” for a fresh table.

Problem 2 — Selecting the Correct Graph Type (C3, C5)

Apply the decision chain: identify the variable type, then select the graph. Click “Try a similar problem” to practice with a different scenario.

A city planner surveys 300 residents about their primary mode of transportation to work (Car, Bus, Bike, Walk, Work from home).

What is the most appropriate graph for displaying these results?

A nurse records the body temperature (°C) of each of 80 patients at admission.

What is the most appropriate graph for displaying the distribution of temperatures?

An online review platform collects star ratings (1 star, 2 stars, 3 stars, 4 stars, 5 stars) from 500 diners for a restaurant.

What is the most appropriate graph for displaying the distribution of ratings?

A meteorologist records the total monthly rainfall (mm) in Montréal over 36 consecutive months.

What is the most appropriate graph for showing how rainfall changed over time?

A sociologist records the number of siblings each student in a class has (0, 1, 2, 3, 4, or 5+).

What is the most appropriate graph for displaying the distribution?

Problem 3 — Reading and Interpreting a Frequency Table (C1)

A fresh frequency table appears each time. Read the value it asks for — sometimes a count, sometimes a proportion or percentage, sometimes the modal class — and type your answer (or select it, for the modal-class question). A wrong entry that matches a common slip returns the reason it is wrong. Click “Try a similar problem” for a new table.

Problem 4 — Identify the Misleading Feature (C6)

A supermarket chain releases a bar chart comparing the annual revenue of its three store formats. Here is a description of the chart:

Y-axis labeled “Revenue (millions)” — starts at $180M, ends at $220M
Three bars: Flagship stores ($215M), Neighbourhood stores ($208M), Express stores ($196M)
The Flagship bar is nearly 4× as tall as the Express bar on the chart

What is the primary misleading feature of this chart?

Show full explanation

The actual difference between Flagship ($215M) and Express ($196M) is $19M — only about 8.8% of the Express revenue. On a y-axis running from $0 to $220M, the Flagship bar would be 97.7% of the chart height, the Express bar 89.1% — visually almost identical.

By starting at $180M, the chart compresses the data range to $40M. Now Express ($196M) appears at of the axis height, while Flagship ($215M) appears at . This makes the Flagship bar look more than twice as tall as Express — a gross visual distortion of the actual 8.8% difference.

Problem 5 — Reading a Histogram (C2)

By now the steps should feel familiar, so this one lets you choose your support. Answer the whole problem at once if you can; break it into steps only if you want them. A new histogram is drawn each time.

Before moving to Independent Practice, check your confidence on these key skills:

I can compute relative and cumulative frequencies from a table
I can identify the correct graph type for a given variable
I can recognize a truncated y-axis as a form of misrepresentation

No hints here — these problems are yours to work through. Use scratch paper as needed. Show the solution when you’re ready to check your work.

Interleaving tip: These problems mix concepts intentionally. Don’t expect every problem to test the same skill as the one before — that’s by design. Research shows interleaved practice builds stronger long-term memory than blocked practice. If you get stuck, the concept tag in each problem header — e.g., (C2) — tells you exactly which subsection of Section 3 to revisit.

Problem 1 — Build a Complete Frequency Table (C1)

A new dataset is generated each time you click “Generate new problem.” Build the full frequency table — absolute, relative, cumulative, and cumulative relative — using 5 equal-width classes.

Problem 2 — Interpreting a Histogram (C2)

The histogram below shows the distribution of daily steps walked by 60 office workers over one month.

Figure 5: Histogram of daily step counts for 60 office workers.

Before reading the questions, study the histogram for ten seconds. Make two quick predictions: (1) Which class has the highest frequency? (2) Is most of the data concentrated toward lower step counts, higher step counts, or roughly in the middle? Hold your answers, then continue.

Reading values off a histogram is a cold skill here — no worked support. Each problem below draws a fresh histogram and asks you to read one value from it (class width, a class frequency, or a cumulative percentage). Type your answer; a common misreading returns the reason it is wrong.

Then, in your own words: describe the shape of the Figure 5 distribution above — where does it peak, and how do the bars change as step count increases or decreases from the peak?

Show model answer — shape of Figure 5

The distribution peaks at the 6,000–7,999 class (the tallest bar, f ≈ 22). From the peak, bars decrease on both sides: the 8,000–9,999 class (f ≈ 15) and 10,000–11,999 class (f ≈ 6) step down to the right, while the 4,000–5,999 class (f ≈ 12) and 2,000–3,999 class (f ≈ 5) step down to the left. The right side (higher step counts) retains more workers than the left side (lower step counts), so the distribution sits somewhat higher in the step-count range — not quite balanced around the centre.

Problem 3 — Select the Best Graph (C5)

Each scenario below describes a dataset — click through several. For each one, commit to the graph you think fits best; a wrong choice explains why the variable type rules it out. Click “Try a similar problem” for a new scenario.

Problem 4 — Two Variables, One Graph (C4, C5)

A public health researcher collects two measurements for each of 85 participants in a study:

Hours of sleep per night (continuous, 4.5 to 9.5 hours)
Score on a cognitive performance test (continuous, 0–100)

She asks: “Does more sleep correlate with better cognitive performance?”

First commit to the graph type:

Then work out the details in writing (or mentally):

a. Explain why that graph fits, using the variable types as your justification. b. Which variable goes on which axis, and why? c. What pattern in the graph would suggest that more sleep is associated with better performance?

Show Solution

(a) Graph: Scatter plot. Both variables are quantitative continuous. When both variables are quantitative and the goal is to explore the relationship between them, a scatter plot is the correct choice. Each participant is one point on the plot; the x-coordinate is hours of sleep and the y-coordinate is the test score.

(b) Axes: Hours of sleep (the explanatory variable — the factor we think might influence performance) goes on the x-axis. Cognitive performance score (the response variable — the outcome we’re measuring) goes on the y-axis. Convention: the variable you think explains variation in the other goes on x.

(c) Pattern suggesting positive association: If more sleep correlates with better performance, the points should form an upward pattern from left to right — participants with fewer hours of sleep (x small) tend to have lower scores (y small), and those with more sleep (x large) tend to have higher scores (y large). This is called a positive association and usually appears as points trending upward from the lower-left to the upper-right of the scatter plot.

Problem 5 — Critique a Misleading Graph (C6)

A political party releases the following graph to support its campaign message “Crime has plummeted under our leadership.”

Figure 6: “Crime Rate Over Six Years” — published by a political party. Data: Year 1: 958, Year 2: 955, Year 3: 952, Year 4: 950, Year 5: 947, Year 6: 943 incidents per 100,000. Use the button only after you’ve answered the questions below.

Name the technique — Figure 6 and beyond. First commit to what makes a graph misleading. Each problem below shows a different real-world chart; identify the distortion. (Figure 6 above uses one of these — decide which before you reveal it.)

Now go deeper on Figure 6 in writing:

b. Compute the true percentage decrease in crime rate from Year 1 (958) to Year 6 (943). c. Describe how the graph should be redrawn to represent the data honestly.

Show Solution — Figure 6

Technique: Truncated y-axis. The y-axis starts at 940 instead of 0 (or at least a much lower value). This compresses the visual scale so that the decline from 958 to 943 — which spans only 15 units — takes up nearly the entire height of the chart. The line appears to drop almost vertically, suggesting a massive collapse in crime.

(b) True percentage decrease:

(c) Honest redraw: Start the y-axis at 0 (or clearly indicate a broken axis if starting mid-range, with a zigzag break symbol). The line would then appear nearly flat over the 6 years, accurately reflecting the modest change. Include the actual data values next to the points and note the y-axis scale explicitly.

Mixed Review — Retrieval from Earlier Lessons

These problems draw on concepts from DS-1. Attempting them without re-reading that lesson is the point — retrieval practice strengthens long-term memory more than re-reading.

Review Problem 1 — Variable Classification and Graph Selection

A classmate has picked a graph for a variable and written out their reasoning. One step is where the choice goes wrong. Find it — this is the exact DS-1 misconception (using a histogram for a categorical variable) applied to graph selection. A new case appears each time.

Review Problem 2 — Sampling Method and Bias

A university wants to estimate the proportion of its 12,000 students who use the campus food bank. The student affairs office proposes the following approach: place a link to an online survey on the campus homepage for one week and count responses.

Which sampling method is this?

The students who use the food bank most have the strongest reason to respond. Which way does that push the estimated proportion?

Then, in writing: name a second source of bias (and its direction), and suggest a better sampling method, explaining why it reduces bias.

Show Solution

Sampling method: Voluntary response sampling (also called self-selection). Only students who notice the link and choose to respond are included — the sample is not randomly selected from the population.

Source of bias 1 — Voluntary response bias: Students who use the food bank regularly have a stronger personal stake in the question and are more likely to click and respond. This pushes the estimated proportion upward, overstating food-bank use.

Source of bias 2 — Undercoverage bias: Students without reliable internet access, those who rarely visit the homepage (e.g., off-campus or part-time students), and students who feel embarrassed disclosing food insecurity will be systematically underrepresented. The first two groups push the estimate in an uncertain direction; the third group — students who do use the food bank but don’t self-report — pushes the estimate downward.

Better method: Simple random sampling (SRS). Randomly select, say, 600 student ID numbers from the registrar’s complete list and send each selected student a private, confidential survey invitation. SRS gives every student an equal chance of selection, eliminating both voluntary response and undercoverage bias. The confidential framing reduces social-desirability pressure as well.

No hints. No scaffolds. These questions test whether you can recall and apply what you’ve learned without support — the clearest signal that you’ve actually internalized it.

Attempt each question fully before revealing the answer. Peeking early short-circuits the retrieval practice that makes this section effective. Even an imperfect attempt trains your memory more than reading the solution directly.

Question 1 — The Feynman Test (C3)

Imagine a classmate missed the lesson on histograms and bar charts. They’ve been told that “a histogram is just a bar chart without gaps” and don’t understand why that matters.

Explain — in your own words, as if to that classmate — why you cannot use a histogram for categorical data, and what the touching bars actually represent. Don’t just state the rule; explain the reasoning behind it.

0 / 500 characters

Show model answer

A histogram’s x-axis is a continuous number line. The bars touch because the classes are adjacent intervals on that line — there is no gap between “40–49” and “50–59” because numbers don’t suddenly stop at 49 and jump to 50. Every value on the number line belongs to exactly one bar, and the bars cover the line with no holes.

For categorical data (like colour, gender, or city), there is no number line. “Red,” “Blue,” and “Green” are not adjacent on any scale — there is nothing “between” red and blue. Drawing the bars touching would imply that some value exists between “Red” and “Blue,” which is nonsense. The gap in a bar chart signals: these are distinct categories with no in-between.

So the touching vs. gap distinction isn’t cosmetic — it tells the viewer whether the x-axis is a continuous scale (histogram) or a set of unrelated categories (bar chart).

Question 2 — Apply It (C5)

A sports analytics team collects data on professional soccer players. For each player, they record:

Position (Goalkeeper, Defender, Midfielder, Forward)
Distance run per game (km, continuous)
Goals scored per season (count, discrete)

Part A: Which graph would best display the distribution of distance run per game across all players?

Part B: Which graph would best show the relationship between distance run and goals scored across all players?

Part C: Which graph would best display the number of players in each position?

Question 3 — Find the Error (C6, C2)

A student creates a histogram to display the following frequency table for the height (cm) of 30 plants:

Height class (cm)
10–19	4
20–29	9
30–49	11
50–59	6

The student draws four bars — all the same width — with heights proportional to the frequencies 4, 9, 11, and 6. They claim the third bar (30–49) is the most frequent class because it is the tallest.

Identify and explain the error.

Show full explanation

The 30–49 class has a width of 20 cm, while the 10–19, 20–29, and 50–59 classes each have a width of 10 cm. The 30–49 class is twice as wide as the others.

In a histogram, a bar’s area — not its height — represents frequency. When class widths are equal, height and area are proportional, so height works fine. But when class widths differ, you must plot frequency density (= frequency ÷ class width) on the y-axis, not raw frequency:

For the 30–49 class: frequency density = 11 ÷ 20 = 0.55 per cm. For the 20–29 class: frequency density = 9 ÷ 10 = 0.90 per cm.

On a frequency density histogram, the 20–29 bar would be taller than the 30–49 bar, correctly reflecting that plants are more densely concentrated in the 20–29 cm range. The student’s equal-width bars with raw frequencies made the wider class look disproportionately dominant.

Question 4 — Relative Frequency, Cold (C1)

No table, no worked support — read the scenario and commit to the relative frequency. A new scenario appears each time.

Self-Assessment

How confident are you with the material in this lesson?

Not confident — I need to reviewVery confident — I’ve got this

You’ve reached the Boss Fight — a substantial challenge that asks you to bring everything together. Two paths, equal in difficulty, different in approach. Choose the one that fits how you like to think.

🔬 The Analyst

You’re handed a pile of raw numbers and a decision to make. Turn it into an honest visual summary the school board can act on — and watch for a trap in how the data is set up.

Best for: students who like working with numbers and computing things step by step

🏗️ The Architect

A company’s annual report looks impressive — but something is off in every chart. Work out what each one is hiding, and how you’d set it right.

Best for: students who like critical thinking, design, and finding what’s wrong with someone else’s work

🔬 Path A: The Analyst

A school board collected data on the number of books read by each of 25 students over a summer reading program. The raw data is:

8, 3, 12, 5, 9, 7, 11, 4, 6, 10, 8, 14, 3, 7, 9, 6, 13, 5, 8, 11, 7, 4, 10, 6, 9

Part 1 — Build the Frequency Distribution

Someone proposes these 4 classes: 2–4, 5–7, 8–10, 11–14. Before you build anything, decide whether that scheme is sound.

Show the fix

The classes 2–4, 5–7, 8–10 each span 3 values, but 11–14 spans 4 values — unequal class widths. Either use 4 equal classes of width 3 (2–4, 5–7, 8–10, 11–13, noting that 14 needs to be included in the last class as 11–14 with an adjusted label), or use a consistent width. The cleanest solution: use 4 classes of width 3 (min = 3, max = 14), so classes: 3–5, 6–8, 9–11, 12–14.

Build the complete frequency table (f, f_r, cf, cf_r) using classes 3–5, 6–8, 9–11, 12–14.

Show Solution — Part 1

Step 1: Tally the data into classes.

3–5: values 3, 5, 4, 3, 5, 4 → f = 6
6–8: values 8, 7, 6, 8, 7, 6, 8, 7, 6 → f = 9
9–11: values 9, 11, 10, 9, 11, 10, 9 → f = 7
12–14: values 12, 14, 13 → f = 3

Check: 6 + 9 + 7 + 3 = 25 ✓

Class
3–5	6	0.24	6	0.24
6–8	9	0.36	15	0.60
9–11	7	0.28	22	0.88
12–14	3	0.12	25	1.00
Total	25	1.00	—	—

Part 2 — Describe the Histogram

Based on your frequency table, describe what the histogram would look like without drawing it. Address:

a. Which class is the modal class (most frequent)? b. Describe the overall shape: where does the distribution peak, and how do the bars change on each side of the peak?

Then commit to one number:

c. What percentage of students read fewer than 9 books? (Use your table: classes 3–5 and 6–8.)

Show Solution — Part 2

(a) Modal class: 6–8 books, with f = 9 (the highest frequency).

(b) Shape: The distribution peaks at the 6–8 class (f = 9, the highest frequency). From the peak, bars decrease in both directions — dropping to f = 7 in the 9–11 class and f = 3 in the 12–14 class on the right, and to f = 6 in the 3–5 class on the left. The right side drops off more gradually than the left, with a thin tail extending to 12–14 books. Most students cluster in the lower-to-middle range, with fewer reading large numbers of books.

(c) Students reading fewer than 9 books: “Fewer than 9” means the 3–5 and 6–8 classes. at the 6–8 class = 15, so .

Part 3 — Interpretation

The school board’s goal was to encourage reading. Based on your analysis, write 2–3 sentences summarizing what the data tells the board — what does the distribution suggest about how students engaged with the program?

Show model interpretation

The modal number of books read was in the 6–8 range, and 60% of students read 8 or fewer books over the summer. Most students engaged at a moderate level, while a smaller group of highly motivated readers reached 12–14 books. The board might consider whether the program successfully motivated the middle group (6–8 books) to go further, or whether a different incentive structure could shift the peak toward higher counts.

Reflection: What was the most challenging part of this analysis? Was it the frequency table arithmetic, the shape description, or the interpretation? What would you do differently on the next dataset?

🏗️ Path B: The Architect

You’ve been hired as a data visualization consultant. A mid-sized retail company has just released its annual report, and their marketing team created three graphs to highlight key metrics. Unfortunately, each graph has a design flaw. Your job: identify each flaw, explain why it misleads, and propose a corrected version.

Graph 1 — Holiday Season Revenue

A bar chart shows quarterly revenue for the past year. The bars are:

Q1: $41.2M — bar is 5.5 cm tall
Q2: $39.8M — bar is 4.2 cm tall
Q3: $40.5M — bar is 4.8 cm tall
Q4: $58.7M — bar is 24.0 cm tall (using a 3D dollar-sign icon, twice as wide as the others)

The headline reads: “Q4 Holiday Sales — Our Best Quarter by Far!”

Commit to the primary flaw, then explain it below:

Full critique of Graph 1 (why + fix)

Two compounding flaws:

1. Truncated y-axis. The bars’ heights don’t start at 0 — they’re scaled only across the range ~$39M to ~$59M. This exaggerates the Q4 spike relative to Q1–Q3.

2. Distorted pictogram area. The Q4 icon is twice as wide AND taller, making its area roughly 4× larger instead of the actual 42% increase (58.7M / $41.2M \approx 1.42$, not 4×). The visual impression of “dominance” is grossly inflated.

Fix: Use a simple bar chart (equal-width bars) with the y-axis starting at $0. The Q4 bar would still be visibly taller — it genuinely is the best quarter — but by a proportionate amount, not an eye-catching 4× visual lie.

Graph 2 — Customer Satisfaction Distribution

A histogram with 5 classes displays customer satisfaction survey scores (0–100):

Class 0–20: f = 8, bar width = 1 cm
Class 21–40: f = 12, bar width = 1 cm
Class 41–80: f = 30, bar width = 2 cm
Class 81–90: f = 25, bar width = 0.5 cm
Class 91–100: f = 15, bar width = 0.5 cm

The team claims “most customers are highly satisfied” pointing to the 41–80 bar being the tallest.

Commit to the primary flaw, then explain it below:

Full critique of Graph 2 (why + fix)

Flaw: Unequal class widths with height (not frequency density) on the y-axis.

The 41–80 class spans 40 points, while 81–90 spans only 10 points. The 41–80 bar has f = 30, but the 81–90 bar has f = 25 in a much smaller range. If you compute frequency density:

41–80: 30 ÷ 40 = 0.75 per point
81–90: 25 ÷ 10 = 2.5 per point

The 81–90 class is far more densely packed with customers. “Most customers are highly satisfied” is actually correct when measured properly — but the original histogram hides this because it doesn’t account for class width.

Fix: Redesign with equal class widths (e.g., 0–19, 20–39, 40–59, 60–79, 80–100) or use frequency density on the y-axis. The story changes from “middle range is most common” to “high satisfaction is most densely concentrated.”

A 3D tilted pie chart shows market share across 7 product categories. The nearest slice (Electronics, 18%) appears to take up roughly 30% of the visual area due to the 3D tilt. The far slices (Furniture 17%, Clothing 16%) appear tiny. The chart has no data labels — only a legend.

Commit to the primary flaw, then explain it below:

Full critique of Graph 3 (why + fix)

Two flaws:

1. 3D tilt distorts area. The perspective projection inflates the slices closest to the viewer and compresses those in the back. This makes the Electronics slice look dominant when it is only 1% larger than Furniture and 2% larger than Clothing. A 3D pie chart almost always misrepresents proportions.

2. Too many slices without labels. Seven slices sharing a legend (not label-per-slice) forces viewers to cross-reference colours, making accurate reading difficult. With 7 categories, a bar chart would be clearer and more honest.

Fix: Replace with a flat 2D bar chart ordered by market share (highest to lowest), with percentage labels on each bar. Eliminates both the 3D distortion and the legend confusion.

Reflection: Of the three graphs, which flaw do you think is the most common in real-world published reports? Which is the easiest to accidentally create without intending to mislead? How would you communicate these issues to the marketing team without sounding accusatory?

Optional stretch material. These problems go beyond the lesson objectives. They’re here if you’re curious, ambitious, or just enjoy a harder challenge. None of the material below is required for DS-3 — but C1 (the ogive) will reappear in DS-5.

Challenge 1 — The Ogive (C1 + extension)

An ogive (pronounced “OH-jive”) is a graph of the cumulative frequency or cumulative relative frequency. Instead of bars, it connects points with a smooth curve — the x-value of each point is the upper class boundary and the y-value is the cumulative frequency up to that boundary.

Use the frequency table from Example 1 (Section 4) to build a cumulative relative frequency ogive:

Class	Upper boundary
1–2	2.5	0.10
3–4	4.5	0.40
5–6	6.5	0.75
7–8	8.5	0.95
9–10	10.5	1.00

The ogive starts at (0.5, 0.00) — the lower boundary of the first class with cumulative frequency 0 — and ends at (10.5, 1.00).

Question: Using the ogive, estimate the value below which approximately 50% of the observations fall (the median). Draw a horizontal line at , find where it intersects the ogive, and read off the x-value.

Show Solution

Between the 3–4 class () and the 5–6 class (), the cumulative relative frequency passes through 0.50. Linear interpolation between the two points (4.5, 0.40) and (6.5, 0.75):

So approximately 50% of batches had 5 or fewer defects — the estimated median is about 5.07 defects. You’ll formalize percentile estimation like this in DS-5.

Preview: The ogive is the graphical tool for reading percentiles directly. The 25th percentile (Q1) corresponds to , the 75th (Q3) to , and the 50th (Q2 = median) to . You’ll use this in DS-5 when studying position in a distribution.

Challenge 2 — Does Bin Width Matter? (C2)

Here are two histograms of the same dataset: ages at first employment for 40 recent graduates, ranging from 18 to 35 years. The only difference is the number of classes used.

Histogram A: 3 classes (width = 6)

Shape appears: roughly symmetric peak at 24–29

Histogram B: 6 classes (width = 3)

Shape appears: peak at 24–26, longer tail toward higher ages

Want to feel this for yourself? Scroll up to the Histogram Builder in Section 3 (C2), paste in any dataset, and drag the Number of classes slider from 3 up to 8. The same numbers will shift between a smooth single hill and a jagged multi-peak shape — exactly the effect these two histograms illustrate.

Answer the following:

a. Both histograms display the same 40 data values. Why do they appear to tell different stories about the shape of the distribution? b. Which histogram would you trust more, and why? c. What would happen to the histogram if you used 18 classes (width = 1 year each)?

Show Solution

(a) The bin width controls how much detail is visible. With 3 wide classes, the 24–29 class lumps together two patterns that Histogram B separates: a high cluster around 24–26 and a secondary peak around 27–29. Wide bins smooth out the distribution and can make an uneven concentration look balanced. Narrow bins reveal the true pattern of where values cluster.

(b) Histogram B (6 classes) is generally more trustworthy here, as it reveals more detail about where values concentrate. However, there’s a tradeoff: too many bins and each bar represents only 1–2 observations, making random variation (noise) look like a pattern. The choice of bin width requires judgment about the sample size and the question being asked. A common rule of thumb (Sturges’ rule) is , suggesting 6–7 bins is appropriate for n = 40.

(c) With 18 classes (each 1 year wide), n = 40 gives an average of about 2 observations per bar. The histogram would be jagged and noisy — some bars would be empty, and the overall pattern would be obscured by sampling variability. Too few bins hides detail; too many bins creates false patterns. The right bin count balances signal and noise.

Challenge 3 — The Double Y-Axis Debate (C6)

A common graph type in business and journalism is the dual y-axis plot (also called a secondary axis chart). It overlays two different variables on the same graph, with one y-axis on the left and a different y-axis on the right.

Example: A financial news article shows monthly ice cream sales (in thousands of units, left y-axis) and monthly drowning rates per 100,000 (right y-axis) on the same graph. The two lines move almost perfectly together. The article implies this suggests a causal link.

a. Why is this graph potentially misleading, independent of the correlation–causation issue? b. Under what conditions is a dual y-axis graph a legitimate and useful tool? c. What is the underlying statistical error in claiming ice cream sales cause drowning rates?

Show Solution

(a) Why the graph is misleading by design: The scales on the two y-axes are chosen independently by the designer. By rescaling either axis, you can make the two lines align perfectly (suggesting correlation) or diverge completely (suggesting no relationship) — with the same data. The visual impression of correlation is entirely a function of the axis scaling choices, not the data. This makes dual y-axis charts inherently subjective and easy to manipulate.

(b) When dual y-axes are legitimate: They can be useful when two variables are measured in genuinely different units and both are relevant to the same story (e.g., overlaying temperature in °C and precipitation in mm on a climate plot, where both axes are clearly labelled and the reader understands they cannot be compared directly). The key conditions: clearly labelled axes, no implication of a direct comparison between the two y-scales, and no manipulation of scale to create a false impression of alignment.

(c) Correlation ≠ causation — the confounding variable: Both ice cream sales and drowning rates are driven by a third variable: summer heat. Hot weather increases both ice cream consumption and the number of people swimming (which creates more opportunities to drown). This is a classic confounding variable (sometimes called a lurking variable). When a third variable causes both variables to change together, a strong correlation can appear even if the two variables have no direct causal link. You’ll study this formally in REG-1 (Correlation Analysis).

Complete, step-by-step solutions for all problems in Sections 5–9 are available on the solutions page. Solutions include worked arithmetic, common mistakes to watch for, and interpretation guidance.

View Full Solutions →

If you’re stuck: Re-read the relevant Core Concept in Section 3, then find the Worked Example that maps to that concept (e.g., Example 1 maps to Concept 1). The solutions page shows the reasoning behind every step, not just the final answer.

Quick-Reference Formulas

Class Width (for frequency distributions): (Always round UP to a convenient number)

Relative Frequency:

Graph Type	Best Used For	Key Feature
Histogram	Quantitative, continuous data	Bars touch (shows continuity)
Bar Chart	Qualitative, categorical data	Bars do not touch
Pareto Chart	Categorical, finding largest factor	Bars sorted descending by frequency
Pie Chart	Parts of a whole (relative freq)	Angles proportional to frequency

Common Misleading Features	Why it’s a problem
Y-axis not starting at 0	Exaggerates small differences (mostly an issue for bar charts)
3D effects / perspective	Distorts areas and makes values hard to read
Pictograms without proportional area	Changing 1D height usually changes 2D area, overstating differences

DS-2: Data Visualization

Section 1: Introduction

Section 2: Prerequisites

Section 3: Core Concepts

C1 — Frequency Distributions

Frequency Distribution

C2 — Histograms (for Quantitative Data)

Histogram

C3 — Bar Charts (for Qualitative Data)

Bar Chart

The Key Difference — One Rule to Remember

C4 — Other Graph Types

Pie Chart

Stem-and-Leaf Plot

Scatter Plot

Time-Series Plot

C5 — Choosing the Right Graph

Graph Selection Decision Chain

C6 — Graph Misrepresentation

Truncated Y-Axis

Other Common Misrepresentation Techniques

Concept Check

Section 4: Worked Examples

Example 1 — Building a Frequency Distribution Table (C1)

Example 2 — Constructing a Histogram (C2)

Example 3 — Choosing the Right Graph (C3, C5)

Example 4 — Spot the Misleading Graph (C6)

Section 5: Guided Practice

Problem 1 — Completing a Frequency Distribution Table (C1)

Problem 2 — Selecting the Correct Graph Type (C3, C5)

Problem 3 — Reading and Interpreting a Frequency Table (C1)

Problem 4 — Identify the Misleading Feature (C6)

Problem 5 — Reading a Histogram (C2)

Section 6: Independent Practice

Problem 1 — Build a Complete Frequency Table (C1)

Problem 2 — Interpreting a Histogram (C2)

Problem 3 — Select the Best Graph (C5)

Problem 4 — Two Variables, One Graph (C4, C5)

Problem 5 — Critique a Misleading Graph (C6)

Mixed Review — Retrieval from Earlier Lessons

Review Problem 1 — Variable Classification and Graph Selection

Review Problem 2 — Sampling Method and Bias

Section 7: Mastery Check

Question 1 — The Feynman Test (C3)

Question 2 — Apply It (C5)

Question 3 — Find the Error (C6, C2)

Question 4 — Relative Frequency, Cold (C1)

Self-Assessment

Section 8: Boss Fight

🔬 The Analyst

🏗️ The Architect

🔬 Path A: The Analyst

Part 1 — Build the Frequency Distribution

Part 2 — Describe the Histogram

Part 3 — Interpretation

🏗️ Path B: The Architect

Graph 1 — Holiday Season Revenue

Graph 2 — Customer Satisfaction Distribution

Graph 3 — Market Share Pie Chart

Section 9: Challenge Problems

Challenge 1 — The Ogive (C1 + extension)

Challenge 2 — Does Bin Width Matter? (C2)

Challenge 3 — The Double Y-Axis Debate (C6)

Section 10: Solutions Reference

Quick-Reference Formulas