Statistics
Why This Matters
Imagine your school collects the marks of 200 students, or a city records the daily temperature for a year. That’s a flood of numbers. Nobody learns anything by staring at 200 figures — what we want is one number that stands in for the whole pile: a typical mark, a typical temperature, the size that shows up most often.
In Class 9 you found the mean, median and mode of a short list of numbers you could read one by one. But real data is usually too big to list individually, so it gets squeezed into a table of class intervals — “10–25 marks: 2 students, 25–40 marks: 3 students, …”. Once the exact values are hidden inside intervals, the old formulas don’t apply as-is.
This chapter teaches you how to pull the three measures of central tendency — mean, mode and median — out of grouped data. You’ll learn three ways to find the mean (and why, magically, they all give the same answer), a formula that finds the mode inside the busiest interval, and a formula that finds the exact middle value. These are the tools that turn a wall of numbers into a single, meaningful summary.
The Big Idea
Grouped data hides the exact values inside class intervals, so we work with each interval’s mid-point (class mark) as its stand-in. The mean is the average of those mid-points weighted by how many values fall in each class — found by the direct, assumed-mean or step-deviation method, which only differ in how tidy the arithmetic is, never in the answer. The mode lives inside the class with the highest frequency (the modal class), and the median lives inside the class where the running total of frequencies first reaches halfway (the median class). Each has a formula that pinpoints the value within that class.
Let’s Break It Down
A quick word on class marks, because all of statistics rests on them. For grouped data we assume every value in a class sits at its centre. The mid-point (class mark) is just the average of the two limits:
class mark = (lower limit + upper limit) / 2
So for the class 10–25 the mark is (10 + 25)/2 = 17.5. This single number represents the whole class.
Mean — the direct method
The mean of grouped data is the sum of (each class mark × its frequency) divided by the total frequency:
x̄ = Σfᵢxᵢ / Σfᵢ
where xᵢ is the class mark and fᵢ the frequency of the i-th class. You build one extra column for the products fᵢxᵢ, add it up, and divide by the total number of observations.
Consider the marks of 30 students:
| Class interval | Frequency (fᵢ) | Class mark (xᵢ) | fᵢxᵢ |
|---|---|---|---|
| 10–25 | 2 | 17.5 | 35.0 |
| 25–40 | 3 | 32.5 | 97.5 |
| 40–55 | 7 | 47.5 | 332.5 |
| 55–70 | 6 | 62.5 | 375.0 |
| 70–85 | 6 | 77.5 | 465.0 |
| 85–100 | 6 | 92.5 | 555.0 |
| Total | Σfᵢ = 30 | Σfᵢxᵢ = 1860 |
Using the table above (marks of 30 students), find the mean marks by the direct method.
- Find each class mark xᵢ = (lower + upper)/2. For 10–25 it’s 17.5, for 25–40 it’s 32.5, and so on — these are already in the table.
- Multiply each class mark by its frequency to fill the fᵢxᵢ column: 2×17.5 = 35, 3×32.5 = 97.5, 7×47.5 = 332.5, and so on.
- Add the columns: Σfᵢ = 30 and Σfᵢxᵢ = 35 + 97.5 + 332.5 + 375 + 465 + 555 = 1860.
- Apply x̄ = Σfᵢxᵢ / Σfᵢ = 1860 / 30 = 62. The mean marks obtained is 62.
Mean — the assumed-mean method
When the class marks are big (say 200, 300, 400…), multiplying each by its frequency is tedious. The trick: guess a mean called the assumed mean a (pick a class mark near the middle), and work with how far each class mark sits from it. That gap is the deviation dᵢ = xᵢ − a. Because we subtracted a from every value, we just add it back at the end:
x̄ = a + (Σfᵢdᵢ / Σfᵢ)
It gives the exact same mean because subtracting a constant from every value lowers the mean by that constant — adding it back restores it. Take a = 47.5 for the marks data:
| Class interval | fᵢ | xᵢ | dᵢ = xᵢ − 47.5 | fᵢdᵢ |
|---|---|---|---|---|
| 10–25 | 2 | 17.5 | −30 | −60 |
| 25–40 | 3 | 32.5 | −15 | −45 |
| 40–55 | 7 | 47.5 | 0 | 0 |
| 55–70 | 6 | 62.5 | 15 | 90 |
| 70–85 | 6 | 77.5 | 30 | 180 |
| 85–100 | 6 | 92.5 | 45 | 270 |
| Total | 30 | Σfᵢdᵢ = 435 |
For the same marks data, find the mean by the assumed-mean method, taking a = 47.5.
- Choose the assumed mean a = 47.5 (a class mark sitting near the centre).
- For each class find the deviation dᵢ = xᵢ − a. For 10–25: 17.5 − 47.5 = −30; for 55–70: 62.5 − 47.5 = 15; and so on.
- Multiply each deviation by its frequency for the fᵢdᵢ column, then add: Σfᵢdᵢ = −60 − 45 + 0 + 90 + 180 + 270 = 435. Also Σfᵢ = 30.
- Apply x̄ = a + (Σfᵢdᵢ / Σfᵢ) = 47.5 + (435 / 30) = 47.5 + 14.5 = 62 — exactly the direct-method answer.
Mean — the step-deviation method
Look at the deviations −30, −15, 0, 15, 30, 45: they’re all multiples of 15, the class size. So divide each by h (the class size) to get even smaller whole numbers uᵢ = (xᵢ − a) / h. Since we shrank every deviation by a factor of h, we scale back up by multiplying by h at the end:
x̄ = a + h × (Σfᵢuᵢ / Σfᵢ)
| Class interval | fᵢ | xᵢ | uᵢ = (xᵢ − 47.5)/15 | fᵢuᵢ |
|---|---|---|---|---|
| 10–25 | 2 | 17.5 | −2 | −4 |
| 25–40 | 3 | 32.5 | −1 | −3 |
| 40–55 | 7 | 47.5 | 0 | 0 |
| 55–70 | 6 | 62.5 | 1 | 6 |
| 70–85 | 6 | 77.5 | 2 | 12 |
| 85–100 | 6 | 92.5 | 3 | 18 |
| Total | 30 | Σfᵢuᵢ = 29 |
For the same marks data, find the mean by the step-deviation method with a = 47.5 and h = 15.
- Take a = 47.5 and the class size h = 15.
- For each class compute uᵢ = (xᵢ − a)/h. For 10–25: (17.5 − 47.5)/15 = −30/15 = −2; for 70–85: (77.5 − 47.5)/15 = 2; and so on.
- Build the fᵢuᵢ column and add: Σfᵢuᵢ = −4 − 3 + 0 + 6 + 12 + 18 = 29, with Σfᵢ = 30.
- Apply x̄ = a + h × (Σfᵢuᵢ / Σfᵢ) = 47.5 + 15 × (29/30) = 47.5 + 14.5 = 62 — the same mean a third time.
Why all three agree. They are the same calculation dressed differently. The assumed-mean method just shifts every value down by a, then adds a back. The step-deviation method does that shift and divides by h, then multiplies by h back. No information is lost — so the mean must be identical. Choose whichever keeps the arithmetic lightest:
| Method | Formula | Best when | Extra columns |
|---|---|---|---|
| Direct | x̄ = Σfᵢxᵢ / Σfᵢ | xᵢ and fᵢ are small | fᵢxᵢ |
| Assumed mean | x̄ = a + Σfᵢdᵢ / Σfᵢ | xᵢ are large | dᵢ, fᵢdᵢ |
| Step-deviation | x̄ = a + h × (Σfᵢuᵢ / Σfᵢ) | deviations share a common factor h | dᵢ, uᵢ, fᵢuᵢ |
In the step-deviation method, you computed Σfᵢuᵢ but forgot to multiply by h at the end. Will your mean be too big, too small, or correct?
Mode of grouped data
The mode is the value that occurs most often. In grouped data you can’t see individual values, but you can spot the class with the highest frequency — the modal class. The mode is some value inside that class, given by:
Mode = l + [(f₁ − f₀) / (2f₁ − f₀ − f₂)] × h
where l = lower limit of the modal class, f₁ = frequency of the modal class, f₀ = frequency of the class just before it, f₂ = frequency of the class just after it, and h = class size. The idea: the mode leans toward whichever neighbour is taller. If the class after the modal class is busier than the one before, the mode shifts that way.
Family-size survey of 20 households:
| Family size | Number of families |
|---|---|
| 1–3 | 7 |
| 3–5 | 8 |
| 5–7 | 2 |
| 7–9 | 2 |
| 9–11 | 1 |
Find the mode of the family-size data above (the highest frequency is 8, in the class 3-5).
- The highest frequency is 8, so the modal class is 3–5. Read off l = 3, h = 2, and f₁ = 8.
- The class before (1–3) has f₀ = 7; the class after (5–7) has f₂ = 2.
- Substitute into Mode = l + [(f₁ − f₀)/(2f₁ − f₀ − f₂)] × h = 3 + [(8 − 7)/(2×8 − 7 − 2)] × 2.
- Simplify: 3 + [1/(16 − 9)] × 2 = 3 + (1/7)×2 = 3 + 0.286 = 3.286. The mode family size is about 3.29.
Median of grouped data
The median is the middle value — half the data lies below it, half above. To find it in grouped data, first build the cumulative frequency (cf): a running total of frequencies down the table. The cumulative frequency of a class is the sum of its own frequency and all the frequencies before it, so it tells you “how many observations are at most this far down”.
Then locate the median class — the first class whose cumulative frequency reaches or passes n/2 (where n = total frequency). The median sits inside it:
Median = l + [(n/2 − cf) / f] × h
where l = lower limit of the median class, n = total frequency, cf = cumulative frequency of the class before the median class, f = frequency of the median class, h = class size. The bracket measures how far past the start of the median class we must walk to reach the (n/2)-th observation.
Marks of 53 students, with the cumulative-frequency column added:
| Marks | Frequency (f) | Cumulative frequency (cf) |
|---|---|---|
| 0–10 | 5 | 5 |
| 10–20 | 3 | 8 |
| 20–30 | 4 | 12 |
| 30–40 | 3 | 15 |
| 40–50 | 3 | 18 |
| 50–60 | 4 | 22 |
| 60–70 | 7 | 29 |
| 70–80 | 9 | 38 |
| 80–90 | 7 | 45 |
| 90–100 | 8 | 53 |
Find the median of the marks of 53 students using the cumulative-frequency table above.
- Here n = 53, so n/2 = 26.5. We need the class whose cumulative frequency first reaches or exceeds 26.5.
- Reading down the cf column: …18, 22, then 29. The cf 29 (class 60–70) is the first to pass 26.5, so 60–70 is the median class.
- Read off l = 60, h = 10, f = 7 (frequency of 60–70), and cf = 22 (cumulative frequency of the class before it, 50–60).
- Substitute: Median = l + [(n/2 − cf)/f] × h = 60 + [(26.5 − 22)/7] × 10 = 60 + (4.5/7)×10 = 60 + 6.43 = 66.4. About half the students scored below 66.4 and half above.
There’s also a handy empirical relationship linking the three: 3 Median = Mode + 2 Mean. It’s approximate, but useful for a quick sanity check or to find the third measure when you know the other two.
Common Mistakes
In the mean, use the class limits (like 10 and 25) instead of the class mark.
The limits are right there in the table and look like the obvious numbers to multiply, while the class mark is an extra step you have to compute.
You must use the class MARK — the mid-point (lower + upper)/2 — as xᵢ. The whole class is assumed to be centred at that mid-point, so 17.5 (not 10 or 25) represents the class 10–25.
The modal class is the class with the largest class interval, or the one with the biggest class mark.
The word 'modal' sounds like it could refer to size or position, and the largest interval or last class is visually prominent.
The modal class is the class with the highest FREQUENCY — the tallest bar. It has nothing to do with how wide the interval is or where it sits; only the f values decide it.
For the median, n/2 itself picks the median class, or cf in the formula is the cumulative frequency of the median class.
n/2 lands inside the median class, so it feels like you should read cf from that same row, and 'cumulative frequency' sounds like the running total up to and including the median class.
The median class is the FIRST class whose cf reaches or exceeds n/2. And cf in the formula is the cumulative frequency of the class JUST BEFORE the median class — the count of everything already passed before you enter it. Using the median class's own cf gives the wrong answer.
For mode and median, you can apply the formulas straight to inclusive classes like 118-126, 127-135.
The table already looks like neat class intervals, so it seems ready to plug in.
The mode and median formulas assume CONTINUOUS classes. Inclusive classes (with gaps, e.g. 118-126 then 127-135) must first be converted to continuous form (117.5-126.5, 126.5-135.5, …) by adjusting each limit by half the gap. Otherwise l and h are wrong.
Quick Check
Which value represents a class interval when finding the mean of grouped data?
The direct, assumed-mean and step-deviation methods are applied to the same grouped data. How do their results compare?
In a grouped distribution, which class is the modal class?
For data with n = 40, the cumulative frequencies down the table are 5, 13, 19, 26, 34, 40. Which is the median class?
Practice Problems
Easy
Find the mean of the following data by the direct method.
| Class | Frequency (fᵢ) | Class mark (xᵢ) | fᵢxᵢ |
|---|---|---|---|
| 0–10 | 4 | 5 | 20 |
| 10–20 | 6 | 15 | 90 |
| 20–30 | 8 | 25 | 200 |
| 30–40 | 2 | 35 | 70 |
| Total | 20 | 380 |
x̄ = Σfᵢxᵢ / Σfᵢ = 380 / 20 = 19.
Find the modal class and the mode of: 0-10 (f=3), 10-20 (f=9), 20-30 (f=15), 30-40 (f=5), 40-50 (f=2).
The highest frequency is 15, so the modal class is 20–30. Here l = 20, h = 10, f₁ = 15, f₀ = 9 (class before), f₂ = 5 (class after).
Mode = l + [(f₁ − f₀)/(2f₁ − f₀ − f₂)] × h = 20 + [(15 − 9)/(30 − 9 − 5)] × 10 = 20 + (6/16)×10 = 20 + 3.75 = 23.75.
Medium
The percentage of female teachers in 35 states is below. Find the mean by the step-deviation method (take a = 50, h = 10).
| % female teachers | fᵢ | xᵢ | uᵢ = (xᵢ−50)/10 | fᵢuᵢ |
|---|---|---|---|---|
| 15–25 | 6 | 20 | −3 | −18 |
| 25–35 | 11 | 30 | −2 | −22 |
| 35–45 | 7 | 40 | −1 | −7 |
| 45–55 | 4 | 50 | 0 | 0 |
| 55–65 | 4 | 60 | 1 | 4 |
| 65–75 | 2 | 70 | 2 | 4 |
| 75–85 | 1 | 80 | 3 | 3 |
| Total | 35 | −36 |
x̄ = a + h × (Σfᵢuᵢ / Σfᵢ) = 50 + 10 × (−36/35) = 50 − 10.29 = 39.71.
(Check by the assumed-mean method: Σfᵢdᵢ = 10 × (−36) = −360, so x̄ = 50 + (−360)/35 = 50 − 10.29 = 39.71 — same answer.)
Find the median of the weights (kg) of 30 students: 40-45 (2), 45-50 (3), 50-55 (8), 55-60 (6), 60-65 (6), 65-70 (3), 70-75 (2).
Build cumulative frequencies: 2, 5, 13, 19, 25, 28, 30. Here n = 30, so n/2 = 15.
The first cf to reach or exceed 15 is 19 (class 55–60), so the median class is 55–60. Read off l = 55, h = 5, f = 6, cf = 13 (class before, 50–55).
Median = l + [(n/2 − cf)/f] × h = 55 + [(15 − 13)/6] × 5 = 55 + (2/6)×5 = 55 + 1.67 = 56.67 kg.
Challenge
The mean of the data below is ₹18. Find the missing frequency f. Classes (allowance in ₹): 11-13 (7), 13-15 (6), 15-17 (9), 17-19 (13), 19-21 (f), 21-23 (5), 23-25 (4).
Use the assumed-mean method with a = 18 (class mark of 17–19). Class marks are 12, 14, 16, 18, 20, 22, 24, so deviations dᵢ = xᵢ − 18 are −6, −4, −2, 0, 2, 4, 6.
| Class | fᵢ | dᵢ | fᵢdᵢ |
|---|---|---|---|
| 11–13 | 7 | −6 | −42 |
| 13–15 | 6 | −4 | −24 |
| 15–17 | 9 | −2 | −18 |
| 17–19 | 13 | 0 | 0 |
| 19–21 | f | 2 | 2f |
| 21–23 | 5 | 4 | 20 |
| 23–25 | 4 | 6 | 24 |
| Total | 44 + f | 2f − 40 |
Mean = a + Σfᵢdᵢ / Σfᵢ, so 18 = 18 + (2f − 40)/(44 + f).
That forces (2f − 40)/(44 + f) = 0, i.e. 2f − 40 = 0, so f = 20.
The median of the distribution is 28.5 and the total frequency is 60. Find x and y. Classes: 0-10 (5), 10-20 (x), 20-30 (20), 30-40 (15), 40-50 (y), 50-60 (5).
Cumulative frequencies: 5, 5+x, 25+x, 40+x, 40+x+y, 45+x+y. The last equals n = 60, so 45 + x + y = 60 → x + y = 15 … (1).
n/2 = 30. The median is 28.5, which lies in 20–30, so 20–30 is the median class: l = 20, h = 10, f = 20, cf = 5 + x (class before).
Median = l + [(n/2 − cf)/f] × h: 28.5 = 20 + [(30 − (5 + x))/20] × 10 = 20 + (25 − x)/2.
So 28.5 − 20 = (25 − x)/2 → 8.5 × 2 = 25 − x → 17 = 25 − x → x = 8.
From (1): y = 15 − 8 = 7. So x = 8, y = 7.
Summary
You should now be able to explain:
- For grouped data, each class is represented by its class mark = (lower + upper)/2.
- Mean — direct method: x̄ = Σfᵢxᵢ / Σfᵢ.
- Mean — assumed-mean method: x̄ = a + (Σfᵢdᵢ / Σfᵢ), where dᵢ = xᵢ − a; handy when class marks are large.
- Mean — step-deviation method: x̄ = a + h × (Σfᵢuᵢ / Σfᵢ), where uᵢ = (xᵢ − a)/h; handy when deviations share the factor h.
- All three mean methods give the same answer — they only differ in how light the arithmetic is.
- Mode: the modal class is the one with the highest frequency, and Mode = l + [(f₁ − f₀)/(2f₁ − f₀ − f₂)] × h.
- Median: find n/2, locate the median class (first cumulative frequency ≥ n/2), then Median = l + [(n/2 − cf)/f] × h, with cf from the class before the median class.
- A quick link between them: 3 Median = Mode + 2 Mean.
- Mode and median formulas need continuous class intervals — convert inclusive classes first.
What’s Next
You’ve learnt to summarise data that has already happened — the typical mark, the most common family size, the middle weight. Next, in Probability, you flip to the future: instead of describing what did occur, you measure how likely something is to occur. From a single tossed coin to a deck of cards, you’ll put a number between 0 and 1 on uncertainty itself.