Learn on PengienVision, Mathematics, Grade 6Chapter 8: Display, Describe, and Summarize Data

Lesson 7: Summarize Data Distributions

In this Grade 6 lesson from enVision Mathematics Chapter 8, students practice summarizing data distributions by calculating mean, median, interquartile range, and identifying measures of center and variability. Students analyze real-world data sets using box plots and dot plots, describing the overall shape of distributions including clusters, gaps, and the effect of outliers on measures of center. The lesson also includes a 3-Act Mathematical Modeling task on vocal range, applying standards 6.SP.A.2, 6.SP.A.3, and 6.SP.B.5.

Section 1

Dot Plots

Property

An easy graph to make for numerical data is called a dot plot.
To create a dot plot, first draw a number line and then place a dot above the number line at the location of each data value.
If a value is repeated, this is represented by placing another dot above the previous instance(s) of that value.
This type of graph allows us to identify clusters (data points together in a group), gaps (intervals without any reported values), peaks (data where there are more responses than for nearby values), and outliers (values that are significantly different from the rest of the data).

Examples

  • A group of friends records the number of pets they own: 1, 0, 2, 1, 1, 3, 5. A dot plot would show a peak at 1, a cluster from 0-3, and a gap before the value at 5.
  • Students' quiz scores are: 8, 9, 10, 7, 9, 9, 8. The dot plot for this data shows a peak at 9, indicating it's the most frequent score, and all data is clustered between 7 and 10.
  • The number of goals scored in 7 soccer games was: 2, 3, 0, 1, 3, 2, 3. The dot plot has a peak at 3, showing it was the most common number of goals scored in a game.

Explanation

Dot plots are perfect for smaller sets of data. They let you see every single data point at a glance, making it easy to spot where data clumps together (clusters) or where the most common value is (peak).

Section 2

Identifying the Shape of a Distribution

Property

A distribution is symmetric if data is evenly distributed around the center, with mean \approx median.
A distribution is skewed if data clusters toward one side: left-skewed when mean << median, right-skewed when mean >> median.

Examples

Section 3

Choosing the Best Summary Statistics

Property

When writing a statistical summary of a distribution, your choice of which numbers to report depends entirely on the shape of the data and the presence of outliers.

  • If Symmetric (No Outliers): Report the Mean for the center and the Standard Deviation for the spread.
  • If Skewed (or has Outliers): Report the Median for the center and the IQR for the spread.

Examples

  • Symmetric Dataset: {20,22,24,25,26,28,30}\{20, 22, 24, 25, 26, 28, 30\}.

The distribution is perfectly balanced with no extreme values. You should report the Mean (25) as the center and calculate the Standard Deviation to describe the variability.

  • Skewed Dataset: {10,12,12,14,15,16,85}\{10, 12, 12, 14, 15, 16, 85\}.

The value 85 is a massive right-side outlier, skewing the distribution. You should ignore the mean and standard deviation. Instead, report the Median (14) as the center and the IQR (1612=416 - 12 = 4) to describe the variability.

Explanation

Statistics is about telling the most honest story possible about your data. If your data is beautifully symmetric, the Mean and Standard Deviation are the most mathematically powerful tools you can use. But if your data is skewed by extreme outliers, using the Mean and Standard Deviation is basically lying to your audience because they will be artificially inflated. In those cases, the Median and IQR will always tell the honest truth about what is actually "typical."

Lesson overview

Expand to review the lesson summary and core properties.

Expand

Section 1

Dot Plots

Property

An easy graph to make for numerical data is called a dot plot.
To create a dot plot, first draw a number line and then place a dot above the number line at the location of each data value.
If a value is repeated, this is represented by placing another dot above the previous instance(s) of that value.
This type of graph allows us to identify clusters (data points together in a group), gaps (intervals without any reported values), peaks (data where there are more responses than for nearby values), and outliers (values that are significantly different from the rest of the data).

Examples

  • A group of friends records the number of pets they own: 1, 0, 2, 1, 1, 3, 5. A dot plot would show a peak at 1, a cluster from 0-3, and a gap before the value at 5.
  • Students' quiz scores are: 8, 9, 10, 7, 9, 9, 8. The dot plot for this data shows a peak at 9, indicating it's the most frequent score, and all data is clustered between 7 and 10.
  • The number of goals scored in 7 soccer games was: 2, 3, 0, 1, 3, 2, 3. The dot plot has a peak at 3, showing it was the most common number of goals scored in a game.

Explanation

Dot plots are perfect for smaller sets of data. They let you see every single data point at a glance, making it easy to spot where data clumps together (clusters) or where the most common value is (peak).

Section 2

Identifying the Shape of a Distribution

Property

A distribution is symmetric if data is evenly distributed around the center, with mean \approx median.
A distribution is skewed if data clusters toward one side: left-skewed when mean << median, right-skewed when mean >> median.

Examples

Section 3

Choosing the Best Summary Statistics

Property

When writing a statistical summary of a distribution, your choice of which numbers to report depends entirely on the shape of the data and the presence of outliers.

  • If Symmetric (No Outliers): Report the Mean for the center and the Standard Deviation for the spread.
  • If Skewed (or has Outliers): Report the Median for the center and the IQR for the spread.

Examples

  • Symmetric Dataset: {20,22,24,25,26,28,30}\{20, 22, 24, 25, 26, 28, 30\}.

The distribution is perfectly balanced with no extreme values. You should report the Mean (25) as the center and calculate the Standard Deviation to describe the variability.

  • Skewed Dataset: {10,12,12,14,15,16,85}\{10, 12, 12, 14, 15, 16, 85\}.

The value 85 is a massive right-side outlier, skewing the distribution. You should ignore the mean and standard deviation. Instead, report the Median (14) as the center and the IQR (1612=416 - 12 = 4) to describe the variability.

Explanation

Statistics is about telling the most honest story possible about your data. If your data is beautifully symmetric, the Mean and Standard Deviation are the most mathematically powerful tools you can use. But if your data is skewed by extreme outliers, using the Mean and Standard Deviation is basically lying to your audience because they will be artificially inflated. In those cases, the Median and IQR will always tell the honest truth about what is actually "typical."