Statistics

Chapter 13 · Mathematics · Class 10 32 min read

Why This Matters

Imagine your school collects the marks of 200 students, or a city records the daily temperature for a year. That’s a flood of numbers. Nobody learns anything by staring at 200 figures — what we want is one number that stands in for the whole pile: a typical mark, a typical temperature, the size that shows up most often.

In Class 9 you found the mean, median and mode of a short list of numbers you could read one by one. But real data is usually too big to list individually, so it gets squeezed into a table of class intervals — “10–25 marks: 2 students, 25–40 marks: 3 students, …”. Once the exact values are hidden inside intervals, the old formulas don’t apply as-is.

This chapter teaches you how to pull the three measures of central tendency — mean, mode and median — out of grouped data. You’ll learn three ways to find the mean (and why, magically, they all give the same answer), a formula that finds the mode inside the busiest interval, and a formula that finds the exact middle value. These are the tools that turn a wall of numbers into a single, meaningful summary.

The Big Idea

Grouped data hides the exact values inside class intervals, so we work with each interval’s mid-point (class mark) as its stand-in. The mean is the average of those mid-points weighted by how many values fall in each class — found by the direct, assumed-mean or step-deviation method, which only differ in how tidy the arithmetic is, never in the answer. The mode lives inside the class with the highest frequency (the modal class), and the median lives inside the class where the running total of frequencies first reaches halfway (the median class). Each has a formula that pinpoints the value within that class.

Let’s Break It Down

A quick word on class marks, because all of statistics rests on them. For grouped data we assume every value in a class sits at its centre. The mid-point (class mark) is just the average of the two limits:

class mark = (lower limit + upper limit) / 2

So for the class 10–25 the mark is (10 + 25)/2 = 17.5. This single number represents the whole class.

Mean — the direct method

The mean of grouped data is the sum of (each class mark × its frequency) divided by the total frequency:

x̄ = Σfᵢxᵢ / Σfᵢ

where xᵢ is the class mark and fᵢ the frequency of the i-th class. You build one extra column for the products fᵢxᵢ, add it up, and divide by the total number of observations.

Consider the marks of 30 students:

Class intervalFrequency (fᵢ)Class mark (xᵢ)fᵢxᵢ
10–25217.535.0
25–40332.597.5
40–55747.5332.5
55–70662.5375.0
70–85677.5465.0
85–100692.5555.0
TotalΣfᵢ = 30Σfᵢxᵢ = 1860
Mean by the direct method

Using the table above (marks of 30 students), find the mean marks by the direct method.

Mean — the assumed-mean method

When the class marks are big (say 200, 300, 400…), multiplying each by its frequency is tedious. The trick: guess a mean called the assumed mean a (pick a class mark near the middle), and work with how far each class mark sits from it. That gap is the deviation dᵢ = xᵢ − a. Because we subtracted a from every value, we just add it back at the end:

x̄ = a + (Σfᵢdᵢ / Σfᵢ)

It gives the exact same mean because subtracting a constant from every value lowers the mean by that constant — adding it back restores it. Take a = 47.5 for the marks data:

Class intervalfᵢxᵢdᵢ = xᵢ − 47.5fᵢdᵢ
10–25217.5−30−60
25–40332.5−15−45
40–55747.500
55–70662.51590
70–85677.530180
85–100692.545270
Total30Σfᵢdᵢ = 435
Mean by the assumed-mean method

For the same marks data, find the mean by the assumed-mean method, taking a = 47.5.

Mean — the step-deviation method

Look at the deviations −30, −15, 0, 15, 30, 45: they’re all multiples of 15, the class size. So divide each by h (the class size) to get even smaller whole numbers uᵢ = (xᵢ − a) / h. Since we shrank every deviation by a factor of h, we scale back up by multiplying by h at the end:

x̄ = a + h × (Σfᵢuᵢ / Σfᵢ)

Class intervalfᵢxᵢuᵢ = (xᵢ − 47.5)/15fᵢuᵢ
10–25217.5−2−4
25–40332.5−1−3
40–55747.500
55–70662.516
70–85677.5212
85–100692.5318
Total30Σfᵢuᵢ = 29
Mean by the step-deviation method

For the same marks data, find the mean by the step-deviation method with a = 47.5 and h = 15.

Why all three agree. They are the same calculation dressed differently. The assumed-mean method just shifts every value down by a, then adds a back. The step-deviation method does that shift and divides by h, then multiplies by h back. No information is lost — so the mean must be identical. Choose whichever keeps the arithmetic lightest:

Picking a method for the mean
MethodFormulaBest whenExtra columns
Directx̄ = Σfᵢxᵢ / Σfᵢxᵢ and fᵢ are smallfᵢxᵢ
Assumed meanx̄ = a + Σfᵢdᵢ / Σfᵢxᵢ are largedᵢ, fᵢdᵢ
Step-deviationx̄ = a + h × (Σfᵢuᵢ / Σfᵢ)deviations share a common factor hdᵢ, uᵢ, fᵢuᵢ
Concept check

In the step-deviation method, you computed Σfᵢuᵢ but forgot to multiply by h at the end. Will your mean be too big, too small, or correct?

Mode of grouped data

The mode is the value that occurs most often. In grouped data you can’t see individual values, but you can spot the class with the highest frequency — the modal class. The mode is some value inside that class, given by:

Mode = l + [(f₁ − f₀) / (2f₁ − f₀ − f₂)] × h

where l = lower limit of the modal class, f₁ = frequency of the modal class, f₀ = frequency of the class just before it, f₂ = frequency of the class just after it, and h = class size. The idea: the mode leans toward whichever neighbour is taller. If the class after the modal class is busier than the one before, the mode shifts that way.

Histogram of family size with intervals 1-3, 3-5, 5-7, 7-9, 9-11 and frequencies 7, 8, 2, 2, 1. The 3-5 bar with frequency 8 is the tallest and is highlighted as the modal class.
The modal class is simply the tallest bar — the class with the highest frequency. Here it is 3 to 5, with frequency 8.

Family-size survey of 20 households:

Family sizeNumber of families
1–37
3–58
5–72
7–92
9–111
Mode of grouped data

Find the mode of the family-size data above (the highest frequency is 8, in the class 3-5).

Median of grouped data

The median is the middle value — half the data lies below it, half above. To find it in grouped data, first build the cumulative frequency (cf): a running total of frequencies down the table. The cumulative frequency of a class is the sum of its own frequency and all the frequencies before it, so it tells you “how many observations are at most this far down”.

Then locate the median class — the first class whose cumulative frequency reaches or passes n/2 (where n = total frequency). The median sits inside it:

Median = l + [(n/2 − cf) / f] × h

where l = lower limit of the median class, n = total frequency, cf = cumulative frequency of the class before the median class, f = frequency of the median class, h = class size. The bracket measures how far past the start of the median class we must walk to reach the (n/2)-th observation.

Marks of 53 students, with the cumulative-frequency column added:

MarksFrequency (f)Cumulative frequency (cf)
0–1055
10–2038
20–30412
30–40315
40–50318
50–60422
60–70729
70–80938
80–90745
90–100853
Median of grouped data

Find the median of the marks of 53 students using the cumulative-frequency table above.

There’s also a handy empirical relationship linking the three: 3 Median = Mode + 2 Mean. It’s approximate, but useful for a quick sanity check or to find the third measure when you know the other two.

Common Mistakes

⚠️ Common mistake
What students think

In the mean, use the class limits (like 10 and 25) instead of the class mark.

Why it seems right

The limits are right there in the table and look like the obvious numbers to multiply, while the class mark is an extra step you have to compute.

What actually happens

You must use the class MARK — the mid-point (lower + upper)/2 — as xᵢ. The whole class is assumed to be centred at that mid-point, so 17.5 (not 10 or 25) represents the class 10–25.

⚠️ Common mistake
What students think

The modal class is the class with the largest class interval, or the one with the biggest class mark.

Why it seems right

The word 'modal' sounds like it could refer to size or position, and the largest interval or last class is visually prominent.

What actually happens

The modal class is the class with the highest FREQUENCY — the tallest bar. It has nothing to do with how wide the interval is or where it sits; only the f values decide it.

⚠️ Common mistake
What students think

For the median, n/2 itself picks the median class, or cf in the formula is the cumulative frequency of the median class.

Why it seems right

n/2 lands inside the median class, so it feels like you should read cf from that same row, and 'cumulative frequency' sounds like the running total up to and including the median class.

What actually happens

The median class is the FIRST class whose cf reaches or exceeds n/2. And cf in the formula is the cumulative frequency of the class JUST BEFORE the median class — the count of everything already passed before you enter it. Using the median class's own cf gives the wrong answer.

⚠️ Common mistake
What students think

For mode and median, you can apply the formulas straight to inclusive classes like 118-126, 127-135.

Why it seems right

The table already looks like neat class intervals, so it seems ready to plug in.

What actually happens

The mode and median formulas assume CONTINUOUS classes. Inclusive classes (with gaps, e.g. 118-126 then 127-135) must first be converted to continuous form (117.5-126.5, 126.5-135.5, …) by adjusting each limit by half the gap. Otherwise l and h are wrong.

Quick Check

Which value represents a class interval when finding the mean of grouped data?

The direct, assumed-mean and step-deviation methods are applied to the same grouped data. How do their results compare?

In a grouped distribution, which class is the modal class?

For data with n = 40, the cumulative frequencies down the table are 5, 13, 19, 26, 34, 40. Which is the median class?

Practice Problems

Easy

easy

Find the mean of the following data by the direct method.

easy

Find the modal class and the mode of: 0-10 (f=3), 10-20 (f=9), 20-30 (f=15), 30-40 (f=5), 40-50 (f=2).

Medium

medium

The percentage of female teachers in 35 states is below. Find the mean by the step-deviation method (take a = 50, h = 10).

medium

Find the median of the weights (kg) of 30 students: 40-45 (2), 45-50 (3), 50-55 (8), 55-60 (6), 60-65 (6), 65-70 (3), 70-75 (2).

Challenge

challenge

The mean of the data below is ₹18. Find the missing frequency f. Classes (allowance in ₹): 11-13 (7), 13-15 (6), 15-17 (9), 17-19 (13), 19-21 (f), 21-23 (5), 23-25 (4).

challenge

The median of the distribution is 28.5 and the total frequency is 60. Find x and y. Classes: 0-10 (5), 10-20 (x), 20-30 (20), 30-40 (15), 40-50 (y), 50-60 (5).

Summary

You should now be able to explain:

  • For grouped data, each class is represented by its class mark = (lower + upper)/2.
  • Mean — direct method: x̄ = Σfᵢxᵢ / Σfᵢ.
  • Mean — assumed-mean method: x̄ = a + (Σfᵢdᵢ / Σfᵢ), where dᵢ = xᵢ − a; handy when class marks are large.
  • Mean — step-deviation method: x̄ = a + h × (Σfᵢuᵢ / Σfᵢ), where uᵢ = (xᵢ − a)/h; handy when deviations share the factor h.
  • All three mean methods give the same answer — they only differ in how light the arithmetic is.
  • Mode: the modal class is the one with the highest frequency, and Mode = l + [(f₁ − f₀)/(2f₁ − f₀ − f₂)] × h.
  • Median: find n/2, locate the median class (first cumulative frequency ≥ n/2), then Median = l + [(n/2 − cf)/f] × h, with cf from the class before the median class.
  • A quick link between them: 3 Median = Mode + 2 Mean.
  • Mode and median formulas need continuous class intervals — convert inclusive classes first.

What’s Next

You’ve learnt to summarise data that has already happened — the typical mark, the most common family size, the middle weight. Next, in Probability, you flip to the future: instead of describing what did occur, you measure how likely something is to occur. From a single tossed coin to a deck of cards, you’ll put a number between 0 and 1 on uncertainty itself.