.. Copyright (C) Google, Runestone Interactive LLC
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0
International License. To view a copy of this license, visit
http://creativecommons.org/licenses/by-sa/4.0/.
Describing Scatter Plots
========================
In this section you will learn how to identify the relationship between
variables in a scatterplot. These variables are the **explanatory** and the
**explained** variable that were defined earlier in the
:ref:`scatterplots section`. The following activities ask you
to examine the scatterplots pictured and then match them with their description.
.. image:: figures/mult_choice_plots.png
:align: center
:alt: Scatterplots labeled A through F.
.. mchoice:: identify_scatterplot_10
The explanatory variable (x) is age in years and the explained variable (y) is
the annual salary for a sample of working adults between the ages of 18 and 65.
- E
+ Correct: Most adults make more money as they get older, but many other factors such as education and career also impact salary.
- A
- Incorrect
- D
- Incorrect
- F
- Incorrect
.. mchoice:: identify_scatterplot_20
The explanatory variable (x) is the mean commute time in minutes and the explained
variable (y) is height in inches for a sample of employees at a small company.
- F
+ Correct: There’s no real relationship between height and commute time.
- A
- Incorrect
- D
- Incorrect
- B
- Incorrect
.. mchoice:: identify_scatterplot_30
The explanatory variable (x) is the month of the year, starting in January, and the
explained variable (y) is the mean temperature for that month for St. Louis, Missouri
which has cold winters and warm summers.
- C
+ Correct: Cold winters and warm summers means smaller values close to the end points 1 = January and 12 = December, and higher temperatures in the middle.
- A
- Incorrect
- D
- Incorrect
- B
- Incorrect
.. mchoice:: identify_scatterplot_40
The explanatory variable (x) is the city miles per gallon and the explained variable (y)
is the highway miles per gallon for a sample of cars.
- C
- Incorrect
- A
- Incorrect
- D
- Incorrect
- B
+ Correct: Cars with higher city mpg also have higher highway mpg.
.. mchoice:: identify_scatterplot_50
The explanatory variable (x) is the number of hours after e-coli has been introduced to a
petri dish and the explained variable (y) is the estimated number of e-coli cells after t
hours. The number of cells doubles about every 20 minutes.
- C
- Incorrect
- A
+ Correct: Because the number of cells is doubling, the change every 20 minutes at the beginning of the experiment is small compared to the change every 20 minutes at the end of the experiment when there are much more cells dividing.
- D
- Incorrect
- B
- Incorrect
.. mchoice:: identify_scatterplot_60
The explanatory variable (x) is the years of driving experience and the explained variable
(y) is the insurance premium paid for a sample of drivers.
- C
- Incorrect
- A
- Incorrect:
- D
+ Correct: Drivers with more driving experience are considered safer, so they pay smaller premiums. Similarly, drivers with less driving experience are considered riskier and pay greater premiums.
- B
- Incorrect
This exercise would be simpler given uniform adjectives that everyone could
understand. When describing the shape of the scatter plot and the relationship
between the explanatory and explained variable, there are three important
features to discuss.
- The **direction** of a scatter plot can be described as positive or
negative. The direction is positive when the explained variable
increases as the explanatory variable increases, or the points of the
scatterplot go up from left to right. The direction is negative when
the explained variable decreases as the explanatory variable
increases, or the points of the scatterplot go down from left to
right.
- The **strength** of a scatter plot is usually described as weak,
moderate or strong. The more spread out the points are, the weaker
the relationship. If the points are clearly clustered, or closely
follow a curve or line, the relationship is described as strong.
- The **linearity** of scatter plot indicates how close the points are
to a straight line. Scatter plots are described as linear or
nonlinear.
.. image:: figures/january_scatterplot.png
:align: center
:alt: A scatterplot depicting the temperature in January across latitudes.
For example, the scatterplot of latitude and January temperatures had
negative direction, as the greater the latitude, the colder the
temperature. Though there are a few :ref:`outliers` (cities
along the northwest coast of the US that have temperate winters, such as
Portland, OR) there is a strong, linear trend.
Given a new set of scatterplots below, repeat the same exercise, but now with
the new descriptions of strength, linearity and direction.
.. image:: figures/mult_choice_plots_abstract.png
:align: center
:alt: Six scatterplots labeled A through F.
.. dragndrop:: dnd_scatterplot0
:feedback: Try again!
:match_1: A|||Positive, strong, nonlinear
:match_2: B|||Positive, strong, linear
:match_3: C|||Neither positive nor negative, strong, nonlinear
:match_4: D|||Negative, moderate, linear
:match_5: E|||Positive, moderate, linear
:match_6: F|||No relationship
Match each scatterplot above with its description.