In one study on wild bears, researchers measured the head lengths and head widths, in inches, of 143 wild bears. The box plots summarize the data from the study.
A set of two box plots, head length in inches, from 2 to 20 by ones. Top box plot labeled male bears. Whisker from 9 to 12 point 5. Box from 12 point 5 to 15 point 5 with vertical line at 13 point 5. Whisker from 15 point 5 to 18 point 5. Bottom box plot labeled female bears. Whisker from 10 to 12. Box from 12 to 13 point 5 with vertical line at 12 point 5. Whisker from 13 point 5 to 15 point 5.
A set of two box plots, head width in inches, from 2 to 20 by ones. Top box plot labeled male bears. Whisker from 4 to 5 point 5. Box from 5 point 5 to 8 with vertical line 6 point 5. Whisker from 8 to 10. Bottom box plot whisker from 4 point 5 to 5. Box from 5 to 6 point 5 with vertical line at 6. Whisker from 6 point 5 to 7 point 5.
Write four statistical questions that could be answered using the box plots: two questions about the head length and two questions about the head width.
Trade questions with your partner.
Decide if each question is a statistical question.
Use the box plots to answer each question.
18.2
Activity
18.3
Activity
18.4
Activity
Scientists studying the yellow perch, a species of fish, believe that the length of a fish is related to its age. This means that the longer the fish, the older it is. Adult yellow perch vary in size, but they are usually between 10 and 25 centimeters.
Scientists at the Great Lakes Water Institute caught, measured, and released yellow perch at several locations in Lake Michigan. This summary is based on a sample of yellow perch from one of these locations.
length of fish in centimeters
number of fish
0 to less than 5
5
5 to less than 10
7
10 to less than 15
14
15 to less than 20
20
20 to less than 25
24
25 to less than 30
30
Use the data to make a histogram that shows the lengths of the captured yellow perch. Each bar should contain the lengths shown in each row in the table.
How many fish were measured? How do you know?
Use the histogram to answer these questions.
How would you describe the shape of the distribution?
Estimate the median length for this sample. Describe how you made this estimate.
Predict whether the mean length of this sample is greater than, less than, or nearly equal to the median length for this sample of fish? Explain your prediction.
Would you use the mean or the median to describe a typical length of the fish being studied? Explain your reasoning.
Based on your work so far:
Would you describe a typical age for the yellow perch in this sample as: young, adult, or old? Explain your reasoning.
Some researchers are concerned about the survival of the yellow perch. Do you think the lengths (or the ages) of the fish in this sample are something to worry about? Explain your reasoning.
18.5
Activity
Navigate to this activity in the digital version of the materials.
The applet contains two data sets, each containing 127 measurements of wingspan or body length of butterflies captured in an area.
Select one data set to analyze. Use the applet to help create a display that summarizes the information, including any values calculated. Make sure that your display contains:
At least one display of the distribution discussed in this unit, as well as a description of why you chose that method to display the distribution.
A value for a measure of center, along with an explanation of why you chose that measure.
A value for a measure of variability, along with an explanation of why you chose that measure.
A few sentences describing the distribution, including any additional features you notice.
Student Lesson Summary
The data displays show the distribution of stickers on 30 pages.
A box plot and dot plot for "stickers on a page". The numbers 8 through 34, in increments of two, are indicated. The box plot is above the dot plot. The five-number summary for the box plot is as follows: Minimum value, 9. Maximum value, 34. Q1, 16. Q2, 20.5. Q3, 26. For the dot plot a triangle is indicated at 21 stickers. A horizontal line is drawn below the triangle and begins at 15.4 and ends at 26.6. The data for the dot plot are as follows: 9 stickers, 1 dot. 10 stickers, 1 dot. 11 stickers, 2 dots. 12 stickers, 1 dot. 14 stickers, 1 dot. 16 stickers, 2 dots. 17 stickers, 1 dot. 18 stickers, 2 dots. 19 stickers, 1 dot. 20 stickers, 3 dots. 21 stickers, 1 dot. 22 stickers, 3 dots. 23 stickers, 1 dot. 24 stickers, 2 dots. 26 stickers, 2 dots. 28 stickers, 1 dot. 30 stickers, 1 dot. 32 stickers, 2 dots. 33 stickers, 1 dot. 34 stickers, 1 dot.
The mean number of stickers, marked by the triangle, is 21. This tells us that if the total number of stickers were redistributed so that each page had the same number of stickers, then each page would have 21. The MAD is 5.6 stickers, which suggests that a page typically has between 15.4 stickers and 26.6 stickers.
The box plot for the same data set is shown above the dot plot. The median shows that half of the pages have greater than or equal to 20.5 stickers, and half have less than or equal to 20.5 stickers. The box shows that the IQR is 10 and that the middle half of the pages have between 16 and 26 stickers.
In this case, the median number of stickers is very close to the mean number of stickers, and the IQR is about twice the MAD. This tells us that the two pairs of measures of center and spread are very similar.
Now let’s look at another example of 30 different pages.
A box plot and a dot plot for "stickers on a page". The numbers 8 through 34, in increments of two, are indicated. For the box plot: The minimum value is 9 and the maximum value is 26. Q1 has a value of 20, Q2 has a value of 23, and Q3 has a vlaue of 24. For the dot plot: A triangle is indicated at 21 stickers. A line indicates the distance from 17.6 to 24.4 stickers. The data is as follows: 9 stickers, 1 dot. 10 stickers, 1 dot. 13 stickers, 1 dot. 14 stickers, 1 dot. 16 stickers, 1 dot. 17 stickers, 1 dot. 19 stickers, 1 dot. 20 stickers, 2 dots. 21 stickers, 2 dots. 22 stickers, 3 dots. 23 stickers, 6 dots. 24 stickers, 5 dots. 25 stickers, 4 dots. 26 stickers, 1 dot.
Here the mean is 21 stickers, and the MAD is 3.4 stickers. This suggests that a page typically has between 17.6 and 24.4 stickers. The median number of stickers is 23, and the box plot shows that the middle half of the data are between 20 and 24 stickers. These two pairs of measures paint very different pictures of the variability of the number of stickers on a page.
From the dot plot, we can see that the median (23 stickers) is closer to the middle of the big cluster of values. If we were to ignore the pages with very few stickers, the median and IQR would give a more accurate picture of how many stickers are typically on a page.
When a distribution is not symmetrical, the median and IQR are often better measures of center and spread than the mean and MAD are. However the decision about which pair of measures to use depends on what we want to know about the group we are investigating.
None
Over a two-week period, Mai records the number of math homework problems she has each school day.
2
15
20
0
5
25
1
0
10
12
Calculate these values. Show your reasoning.
The mean number of math homework problems
The mean absolute deviation (MAD)
Interpret the mean and MAD. What do they tell you about the number of math homework problems Mai had over these two weeks?
Find or calculate the following values and show your reasoning.
The median, quartiles, maximum, and minimum of Mai’s data
The interquartile range (IQR)
Which pair of measures of center and variability—mean and MAD, or median and IQR—do you think summarizes the distribution of Mai’s math homework assignments better? Explain your reasoning.
Jada wants to know whether a dot plot, a histogram, or a box plot would best show the distribution of her homework data.
0
0
4
4
5
5
7
8
10
23
Use the axis to make a dot plot to represent the data, and indicate the mean of 6.6 with a triangle. The MAD is 4.32.
Draw a box plot that represents Jada’s homework data.
Work with your group to draw three histograms to represent Jada’s homework data. The width of the bars in each histogram should represent a different number of homework problems.
The width of one bar represents 10 problems.
The width of one bar represents 5 problems.
The width of one bar represents 2 problems.
Which of the five representations should Jada use to summarize her data? Explain your reasoning.