The Amazon Basin in South America contains over half of the planetβs rain forest. The Amazon rain forest is home to the largest collection of plant and animal species in the world, including more than one-third of all living species. During the 1960s, colonists began cutting down the rain forest to clear land for agriculture. The construction of the Trans-Amazonian Highway in the early 1970s opened large forest areas to development by settlers and commercial interests, increasing the rate of deforestation.
Environmentalists are concerned about the loss of biodiversity which will result from destruction of the forest, and about the release of the carbon contained within the vegetation, which could accelerate global warming.
In Brazil, the Instituto Nacional de Pesquisas Espaciais (INPE, or National Institute of Space Research) uses Landsat satellite photos to monitor the pace of deforestation. According to their data, the original Amazon rain forest biome in Brazil of 4,100,000 square kilometers was reduced to 3,413,000 square kilometers by 2005, representing a loss of 16.8%. The figures for 1987 to 2006 are shown at right, and a plot of the data appears below. \(~\alert{\text{[TK]}}\)
Although the data points do not all lie exactly on a straight line, they are very close. One question we might ask is: If deforestation continues at the same rate, when will the Amazon rain forest disappear completely? In this section we learn to find a linear model that approximates a data set.
In most cases, a mathematical model is not a perfect description of reality. Many factors can affect empirical data, including measurement error, environmental conditions, and the influence of related variables. Nonetheless, we can often find an equation that approximates the data in a useful way.
The graph shown is called a scatterplot. The data are not strictly linear, because the slope is not constant: from 1960 to 1965, the minimum wage increased at an average rate of
\begin{equation*}
\dfrac{1.25-1.00}{5}=0.05~ \text{dollars per year}
\end{equation*}
and from 1970 to 1975, the minimum wage increased at a rate of
\begin{equation*}
\dfrac{2.10-1.60}{5}=0.10~ \text{dollars per year}
\end{equation*}
However, the data points do appear to lie close to an imaginary line.
We would like to draw a line that comes as close as possible to all the data points, even though it may not pass precisely through any of them. In particular, we try to adjust the line so that we have the same number of points above the line and below the line. One possible solution is shown in the figure at right.
A line that fits the data in a scatterplot is called a regression line. Drawing a regression line by eye is a subjective process. Using technology, we can compute a particular regression line called the least-squares regression line, which is widely used in statistics and modeling.
We can still find an equation for a line of best fit using the point-slope formula. (To review using the point-slope formula, see Finding a Linear Model in Section 1.5.) We choose two points on the line whose coordinates we can estimate fairly accurately. Note that these two points need not be any of the original data points.
The regression line in the Example above appears to pass through the points \((5, 1.25)\) and \((25, 3.35)\text{.}\) Use those points to find an equation for the regression line. \(~\alert{\text{[TK]}}\)
An outdoor snack bar collected the following data showing the number of cups of cocoa, \(C\text{,}\) they sold when the high temperature for the day was \(T\degree\) Celsius.
Read values from your line for the number of cups of cocoa that will be sold when the temperature is \(8\degree\text{C}\) and when the temperature is \(16\degree\text{C}\text{.}\)
Use your equation to predict the number of cups of cocoa that will be sold when the temperature is \(9\degree\text{C}\text{,}\) and when the temperature is \(24\degree\text{C}.\)
The regression line need not pass through any of the data points, but it should be as close as possible. We try to draw the regression line so that there are an equal number of data points above and below the line.
The points \((8, 32)\) and \((16, 12)\) appear to lie on the regression line. According to this model, the snack bar will sell 32 cups of cocoa when the temperature is \(8\degree\text{C}\text{,}\) and 12 cups when it is \(16\degree\text{C}\text{.}\) These values are close to the actual data, but not exact.
To find an equation for the regression line, we use two points on the lineβnot data points! We will use \((8, 32)\) and \((16, 12)\text{.}\) First we compute the slope
Using a regression line to estimate values between known data points is called interpolation. If the data points lie fairly close to the regression line, then interpolation will usually give a fairly accurate estimate. In the Example above, the estimate of 29 or 30 cups of cocoa at \(9\degree\text{C} \) seems reasonable in the context of the data.
Making predictions beyond the range of known data is called extrapolation. Extrapolation can often give useful information, but if we try to extrapolate too far beyond our data, we may get unreasonable results. The conditions that produced the data may no longer hold, as in the Example above, or other unexpected conditions may arise to alter the situation.
Use your regression equation from the previous Example to predict the number of cups of cocoa sold when the temperature is \(-10^\circ\text{C} \text{.}\)
The data in a scatterplot may show a linear trend, even though the individual points are not clustered closely around a line. Scattering of data is common in the social sciences, where many variables may influence a particular situation. Nonetheless, by analyzing the data, we may be able to detect a connection between some of the variables.
The worldβs population is growing at different rates in different nations. Many factors, including economic and social forces, influence the birthrate. Is there a connection between birth rates and education levels? The figure below shows the birth rate plotted against the female literacy rate in 148 countries.
\(-0.05\) births per woman per percentage point. The birth rate decreases by 0.05 births per woman for each percentage point increase in the female literacy rate.
On an international flight a passenger may check two bags each weighing 70 kilograms, or 154 pounds, and one carry-on bag weighing 50 kilograms, or 110 pounds. Express the weight, \(p\text{,}\) of a bag in pounds in terms of its weight, \(k\text{,}\) in kilograms.
Ms. Randolph bought a used car in 2010. In 2012 the car was worth $9000, and in 2015 it was valued at $4500. Express the value, \(V\text{,}\) of Ms. Randolphβs car in terms of the number of years, \(t\text{,}\) she has owned it.
The number of manatees killed by watercraft in Florida waters has been increasing since 1975. Data are given at 5-year intervals in the table, and a scatterplot with regression line is shown below. (Source: Florida Fish and Wildlife Conservation Commission)
In Problems 7 and 8, the regression lines can be improved by adjusting either \(m\) or \(b\text{.}\) Draw a line that fits the data points more closely.
Newborn blue whales are about 24 feet long and weigh 3 tons. The young whale nurses for 7 months, at which time it is 53 feet long. Estimate the length of a 1-year-old blue whale.
A truck on a slippery road is moving at 24 feet per second when the driver steps on the brakes. The truck needs 3 seconds to come to a stop. Estimate the truckβs speed 2 seconds after the brakes were applied.
The temperature of an automobile engine is \(9\degree\) Celsius when the engine is started and \(51\degree\)C seven minutes later. Use a linear model to predict the engine temperature for both 2 minutes and 2 hours after it started. Are your predictions reasonable?
The elephant at the City Zoo becomes ill and loses weight. She weighed 10,012 pounds when healthy and only 9641 pounds a week later. Predict her weight after 10 days of illness.
Use your line to predict the 400-meter time of a woman who runs the 100-meter dash in 11.2 seconds, and the 400-meter time of a woman who runs the 100-meter dash in 13.2 seconds.
With Americansβ increased use of faxes, pagers, and cell phones, new area codes are being created at a steady rate. The table shows the number of areacodes in the US each year. (Source: USA Today, NeuStar, Inc.)
The table shows the amount of carbon released into the atmosphere annually from burning fossil fuels, in billions of tons, at 5-year intervals from 1950 to 1995. (Source: www.worldwatch.org)
Let \(t\) represent the number of years after 1950 and plot the data. Scale the \(t\)-axis from 0 to 50 by 5βs, and the \(C\)-axis from 0 to 7 by 0.5βs.
Male birds with the largest repertoire of songs are the first to acquire mates in the spring. The table shows the number of different songs sung by several sedge warblers, and the days on which they acquired their mates, where day 1 is April 20. (Source: Krebs and Davies, 1993)
One measure of a personβs physical fitness is the body mass index, or BMI. Your body mass index is the ratio of your weight in kilograms to the square of your height in meters. The points on the scatterplot show the BMI of Miss America from 1921 to 1991.
The equation of the least-squares regression line for the data is
\begin{equation*}
y = 20.69 - 0.04t
\end{equation*}
where \(t\) is the number of years since 1920. On the figure above, relabel the horizontal axis with values of \(t\text{.}\) Then graph this line and compare to your estimated line of best fit.
The Center for Disease Control considers a BMI between 18.5 and 24.9 to be healthy. In 2002, Miss America was 5β3" tall and weighed 110 pounds. Calculate her BMI. (You will need to convert inches to meters and pounds to kilograms.)