This unit is a brief overview of some key statistical concepts. First, students learn about populations and study variables associated with a population. They begin by classifying questions as either statistical or non-statistical—based on whether variable data is necessary to answer the question. This leads to further investigation into variability and data displays, such as dot plots and histograms. As students visualize data, they begin to describe the distribution of data more precisely as they work with mean and mean absolute deviation (MAD).
After working with those statistics, students begin to recognize that some distributions are not well-suited to description by mean and MAD. Students are introduced to median, range, and interquartile range as additional measures of center and variability that can be used to describe distributions in some situations. That also leads to the box plot as an additional way to visualize data.
Box plots and dot plots for two sets of data: "pug weights in kilograms” and "beagle weights in kilograms". The numbers 6 through 11 are indicated and there are tick marks midway between each indicated number. Each box plot is above it's corresponding dot plot. The approximate data for the box plot for "pug weights in kilograms" are as follows: Minimum value, 6. Maximum value, 8. Q1, 6.5. Q2, 7. Q3, 7.3. The approximate data for the dot plot for "pug weights in kilograms" are as follows: 6 kilograms, 1 dot. 6.2 kilograms, 2 dots. 6.4 kilograms, 2 dots. 6.6 kilograms, 2 dots. 6.8 kilograms, 2 dots. 7 kilograms, 3 dots. 7.2 kilograms, 3 dots. 7.4 kilograms, 1 dot. 7.6 kilograms, 2 dots. 7.8 kilograms, 1 dot. 8 kilograms, 1 dot. The approximate data for the box plot for "beagle weights in kilograms" are as follows: Minimum value, 9. Maximum value, 11. Q1, 9.6. Q2, 10. Q3, 10.5. The approximate data for the dot plot for "beagle weights in kilograms" are as follows: 9 kilograms, 1 x. 9.2 kilograms, 2 x's. 9.4 kilograms, 1 x. 9.6 kilograms, 3 x's. 9.8 kilograms, 1 x. 10 kilograms, 3 x's. 10.2 kilograms, 3 x's. 10.4 kilograms, 1 x. 10.6 kilograms, 2 x's. 10.8 kilograms, 2 x's. 11 kilograms, 1 x.
Next, students examine different ways to collect data from samples within a population to understand why random selection is useful. Then students generate samples and estimate information about the population from sample data.
The unit concludes with an optional section exploring probability. Students are introduced to probability as a way to quantify how likely an event is to happen. They explore the connection between probability and results of repeated experiments, ways to examine the sample space for more complex experiments, and simulating experiments.
Note that the introduction of mean absolute deviation is used as an introductory model for understanding variability. Although standard deviation is more mathematically useful, its calculation and meaning may be difficult for students at this level without an understanding of normal distributions. In later courses, when student understanding of variability and their exposure to additional distributions is expanded, students will learn about standard deviation and evolve their understanding away from mean absolute deviation.
Progression of Disciplinary Language
In this unit, teachers can anticipate students using language for mathematical purposes, such as comparing, interpreting, and justifying. Throughout the unit, students will benefit from routines designed to grow robust disciplinary language, both for their own sense-making and for building shared understanding with peers. Teachers can formatively assess how students are using language in these ways, particularly when students are using language to:
Compare
Questions that produce numerical and categorical data (Lesson 1).
Dot plots and histograms (Lesson 3).
Features and distributions of data sets (Lessons 4 and 5).
Measures of center with samples (Lesson 9).
Sampling methods (Lesson 10).
Methods for writing sample spaces (Lesson 15).
Interpret
Dot plots (Lessons 2 and 5).
Histograms (Lesson 3).
Mean of a data set (Lesson 4).
Five-number summaries and box plots (Lesson 7).
Situations involving populations and samples (Lesson 8).
Situations involving sample spaces and probability (Lesson 16).
Justify
Reasoning for matching data sets to questions (Lesson 1).
Reasoning about mean and median (Lesson 6).
Which samples are or are not representative of a larger population (Lesson 9).
Which samples correspond with different populations (Lesson 11).
Whether situations are surprising and possible (Lesson 14).
In addition, students are expected to represent data using dot plots, histograms, five-number summaries, and box plots, and to represent probabilities using sample spaces. Students also have opportunities to use language to describe features of a data set, describe patterns observed in repeated experiments, and explain how to use a simulation to answer questions about the situation.
The table shows lessons where new terminology is first introduced in this course, including when students are expected to understand the word or phrase receptively and when students are expected to produce the word or phrase in their own speaking or writing. Terms that appear bolded are in the Glossary. Teachers should continue to support students’ use of a new term in the lessons that follow where it was first introduced.
lesson
new terminology
receptive
productive
Acc6.8.1
numerical data
categorical data
dot plot
statistical question
variability
distribution
frequency
Acc6.8.2
center
spread
typical
variability
Acc6.8.3
histogram
bins
distribution
center
spread
Acc6.8.4
average
mean measure of center
fair share
balance point
Acc6.8.5
mean absolute deviation (MAD)
measure of spread
symmetrical
mean
typical
Acc6.8.6
median
peak
cluster
unusual value
measure of center
Acc6.8.7
range
quartile
interquartile range (IQR) box plot
whisker
five-number summary
median
measure of spread
minimum
maximum
Acc6.8.8
population
sample
survey
mean absolute deviation (MAD)
Acc6.8.9
representative
Acc6.8.10
random sample
Acc6.8.11
measure of variability
population
sample
random sample
symmetrical
Acc6.8.12
representative
measure of variability
Acc6.8.13
event
chance experiment
outcome
probability
random
sample space
Describe a distribution represented by a dot plot, including informal observations about its center and spread.
Interpret a histogram to answer statistical questions about a data set.
Section Narrative
In this section, students focus on describing distributions. In particular, they learn to describe the center and spread of a distribution by using informal language to refer to a typical value for a distribution and how spread out the data are. They use dot plots and histograms to represent data, and use the visualization to describe features of a distribution such as clusters, peaks, gaps, and symmetry.
A dot plot, the numbers 10 through 35, in increments of 5, are indicated. The 30 data values are as follows: 10 kilograms, 1 dot. 11 kilograms, 1 dot. 12 kilograms, 2 dots. 13 kilograms, 1 dot. 15 kilograms, 1 dot. 16 kilograms, 2 dots. 17 kilograms, 1 dot. 18 kilograms, 2 dots. 19 kilograms, 1 dot. 20 kilograms, 3 dots. 21 kilograms, 1 dot. 22 kilograms, 3 dots. 23 kilograms, 1 dot. 24 kilograms, 2 dots. 26 kilograms, 2 dots. 28 kilograms, 1 dot. 30 kilograms, 1 dot. 32 kilograms, 2 dots. 34 kilograms, 2 dots.
A histogram, the horizontal axis is labeled “dog weights in kilograms” and the numbers 10 through 35, in increments of 5, are indicated. On the vertical axis the numbers 0 through 10, in increments of 2, are indicated. The data represented by the bars are as follows: Weight from 10 up to 15, 5. Weight from 15 up to 20, 7. Weight from 20 up to 25, 10. Weight from 25 up to 30, 3. Weight from 30 up to 35, 5.
Note that in all histograms in this unit, the left-end boundary of each bin or interval is included and the right-end boundary is excluded. For example, the number 5 would not be included in the 0–5 bin, but would be included in the frequency count for the 5–10 bin. This is only a convention, so check any technology used to create histograms to determine if it matches this convention.
Calculate and interpret the mean and mean absolute deviation (MAD) of a data set.
Calculate and interpret the median and interquartile range (IQR) of a data set.
Section Narrative
In this section, students begin to quantify their understanding of center and spread by finding values for the mean and mean absolute deviation (MAD). The mean is explained as a way of fairly sharing as well as a balance point to give additional intuition into the measure of center.
Then, students see that, even with the same mean, distributions can be very different and that a description of a measure of variability is often important. They use mean absolute deviation as a way to describe the variability of a distribution in a way that has some meaning.
A dot plot for “berry weights in grams.” The numbers 1 through 8 are indicated. The data for the dot plot are as follows: 2 grams, 2 dots. 2.5 grams, 3 dots. 3 grams, 4 dots. 3.5 grams, 4 dots. 4 grams, 2 dots. 4.5 grams, 2 dots. 5.5 grams, 1 dot. 6.5 grams, 1 dot.
Next, students add “median,” “range,” and “interquartile range” to their methods of describing a measure of center or measure of variability. They use the symmetry of a distribution to determine whether mean or median is likely to be a better description of the center. Then they explore box plots as a way to visualize a summary of data using the five-number summary including the minimum, maximum, median, and 2 other quartiles.
A box plot for “berry weights in grams.” The numbers 1 through 8 are indicated. The five-number summary for the box plot are as follows: Minimum value, 2. Maximum value, 6.5. Q1, 2.5. Q2, 3.5. Q3, 4.
Describe methods to obtain a random sample from a population, and explain why it is representative of the population.
Explain why samples are necessary and describe a sample and population for a given statistical question.
Use the mean of a random sample to make inferences about the population.
Section Narrative
In this section, students consider much larger populations to motivate the need to sample to obtain data. This leads to considering how some samples may be more representative of the population than others and the idea that random selection is more likely to produce representative samples.
Then students use samples to gain information about the populations they represent. In particular, students estimate measures of center for populations based on information from a sample.
population
A dot plot for “height in centimeters.” The numbers 1 through 11 are indicated. The data are as follows: 1 centimeter, 5 dots; 2 centimeters, 7 dots; 3 centimeters, 8 dots; 4 centimeters, 8 dots; 5 centimeters, 5 dots; 6 centimeters, 3 dots; 7 centimeters, 2 dots; 8 centimeters, 2 dots; 9 centimeters, 1 dot; 10 centimeters, 3 dots; 11 centimeters, 5 dots.
sample
A dot plot for “height in centimeters.” The numbers 1 through 11 are indicated. The data are as follows: 1 centimeter, 1 dot; 2 centimeters, 2 dots; 3 centimeters, 4 dots; 4 centimeters, 4 dots; 5 centimeters, 2 dots; 6 centimeters, 1 dot; 7 centimeters, 1 dot; 10 centimeters, 1 dot; 11 centimeters, 2 dots.
Describe a multi-step experiment that could be used to simulate a compound event in a real-world situation, and justify that it represents the situation.
Interpret or create a list, table, or tree diagram that represents the sample space of a compound event, and use the sample space to write the probability for an event.
Use the results from a repeated experiment to estimate the probability of an event, and justify the estimate.
Use the sample space to determine the probability of an event, and express it as a fraction, decimal, or percentage.
Section Narrative
This section is optional because it gives a light introduction to probability, which is not essential for the unit’s main focus on statistics. In this section, students learn how to quantify the likelihood of events using probability. First, they list the sample space for a chance experiment and use it to assign values to the likelihood, such as 50% or . Then students recognize these values as the fraction of times an event is likely to happen after many repeated trials of the chance experiment.
Next, students expand their understanding of probability to more complex chance experiments that involve multiple steps, such as rolling a number cube and flipping a coin. They use structures such as tree diagrams, tables, and lists to record the sample spaces. After using those sample spaces to write probabilities, they design their own simulation to estimate the probability of an experiment that would be difficult to repeat.
Circular spinner divided into four equal parts. The first part is red and labeled “R,” the second part is blue and labeled “B,” the third part is green and labeled “G,” and the fourth part is yellow and labeled “Y.” The pointer is in the part labeled “B.”
Circular spinner divided into five equal parts. Starting from the top right, and moving clockwise, the first part is labeled 1, the second, 2, the third, 3, the fourth, 4, and the fifth, 5. The pointer is in the part labeled “5.”