18.4 Summarise data
Measures of central tendency
Measures of central tendency are summary values that indicate the central position within a data set. The three main measures of central tendency are the mean (also called the average), the median and the mode.
The mean is the sum of a set of values, divided by the number of values in the set.
- mean
- the sum of a set of values, divided by the number of values in the set
Worked Example 18.5: Calculating the mean of a data set
Calculate the mean for the given data set (correct to one decimal place).
\[4; 6; 7; 3; 4; 8; 4; 2; 9\]Find the sum of the data values.
Add all the data values together:
\[4 + 6 + 7 + 3 + 4 + 8 + 4 + 2 + 9 = 47\]Count how many values are in the data set.
There are \(9\) values in the data set.
Calculate the mean.
We calculate the mean by dividing the sum of the data values by the number of data values.
\[\begin{align} \text{Mean} &= \frac{47}{9} \\ &= 5,2 \end{align}\]The median of a data set is the value in the central position, when the data set has been arranged from the lowest value to the highest value.
- median
- the value in the central position when the data set has been arranged from the lowest value to the highest value
We need to consider two cases when we find the median of a data set:
- When there is an odd number of data values, there is a middle value. For example, consider the data set: \(5; 7; 9; 12; 15\). There are five data values, so the median (middle value) is \(9\). There are two values to the left of \(9\) and two values to the right.
- When there is an even number of data values, there is no middle value. In this case, the median will lie halfway between two values in the data set. For example, consider the data set: \(1; 2; 5; 7; 10; 12\). There are six data values, so there is no middle value. So to find the median we calculate the value that is halfway between the two middle values of the data set: \(= \frac{5 + 7}{2} = 6\).
Worked Example 18.6: Finding the median of a data set with an odd number of values
Calculate the median for the given data set.
\[10; 14; 87; 2; 68; 99; 1\]Order the data set.
Arrange the data values in ascending order.
\[1; 2; 10; 14; 68; 87; 99\]Count the number of values in the data set.
There are \(7\) values in the data set.
Find the median.
The middle value of the data set is the number in the fourth position of the ordered data set. The median of the data set is \(14\). There are three data values to the left of \(14\) and three data values to the right.
Worked Example 18.7: Finding the median of a data set with an even number of values
Calculate the median for the given data set.
\[12; 20; 14; 85; 3; 62; 89; 16\]Order the data set.
Arrange the data values in ascending order.
\[3; 12; 14; 16; 20; 62; 85; 89\]Count the number of values in the data set.
There are \(8\) values in the data set.
Find the median.
Since there is an even number of data values in the data set, there is no middle value.
The median lies halfway between the two middle values in the data set, \(16\) and \(20\).
\[\begin{align} \text{Median} &= \frac{16 + 20}{2} \\ &= \frac{36}{2} \\ &= 18 \end{align}\]The mode of a data set is the value that occurs most often in the set. The mode can also be described as the most frequent or most common value in the data set.
- mode
- the value that occurs most often in the data set
To calculate the mode, we simply count the number of times each value appears in the data set and then find the value that appears most often.
A data set can have more than one mode if there is more than one value with the highest count. For example, both \(2\) and \(3\) are modes in the data set \(1; 2; 2; 3; 3\). There are two modes, so we call this a bimodal data set.
If all points in a data set occur with equal frequency, the data set can be described as having many modes or no mode.
- bimodal
- having two modes
Worked Example 18.8: Finding the mode of a data set
Find the mode for the data set.
\[2; 3; 2; 4; 10; 4; 6; 7; 6; 8; 8; 4; 10\]Order the data set.
Arrange the data values in ascending order.
\[2; 2; 3; 4; 4; 4; 6; 6; 7; 8; 8; 10; 10\]Count the number of times each data value appears in the data set.
Use a table to count the number of times each data value appears in the data set.
Data value | Count |
---|---|
\(2\) | \(2\) |
\(3\) | \(1\) |
\(4\) | \(3\) |
\(6\) | \(2\) |
\(7\) | \(1\) |
\(8\) | \(2\) |
\(10\) | \(2\) |
Check that the total count equals the number of values in the data set.
Make sure that total count equals the number of values in the data set.
\[\begin{align} \text{Total count} &= 2 + 1 + 3 + 2 + 1 + 2 + 2 \\ &= 13 \end{align}\]Number of values in the data set is \(13\).
Identify the value(s) that appear most often.
From the table above we can see that \(4\) is the only value that appears \(3\) times. All the other values appear fewer than \(3\) times.
Therefore, the mode of the data set is \(4\).
Measures of dispersion
Measures of dispersion are values that describe the spread of a data set. We will look at two measures of dispersion:
- The range of a data set is the difference between the maximum value and the minimum value in the set.
- The extreme value (also called an outlier) is a number in the data set that is not typical of the rest of the set. It is usually a value that is much greater or much smaller than all the other values in the data set.
- range
- the difference between the maximum value and the minimum value in the data set
- extreme value
- a number in the data set that is not typical of the rest of the set
Worked Example 18.9: Finding the range of a data set
Calculate the range of the data set.
\[11; 20; 14; 75; 5; 63; 81; 17\]Order the data set.
Arrange the data values in ascending order.
\[5; 11; 14; 17; 20; 63; 75; 81\]Identify the maximum value and minimum value in the data set.
The maximum value is \(81\) and minimum value is \(5\).
Calculate the range.
\[\begin{align} \text{Range} &= 81 − 5 \\ &= 76 \end{align}\]Worked Example 18.10: Identifying an extreme value in a data set
Consider the data set.
\[2\ 415; 2\ 847; 2\ 935; 35\ 415; 3\ 049; 3\ 220; 2\ 105\]Does any value seem to be much higher than the other values in the data set?
Order the data set.
Arrange the data values in ascending order.
\[2\ 105; 2\ 415; 2\ 847; 2\ 935; 3\ 049; 3\ 220; 35\ 415\]Identify any extreme values in the data set.
The minimum value is \(2\ 105\) and the maximum value is \(35\ 415\). The second largest value in the data set is \(3\ 220\).
The value \(35\ 415\) seems to be an extreme value for this data set.