9.10. Predicting Pizza Prices  Linear Regression¶
Linear regression is probably one of the most widely used algorithms in data science, and many other sciences. One of the best things about linear regression is that it allows us to learn from things that we know or use observations and measurements of things we know to make predictions about new things. These predictions might be about the likelihood of a person buying a product online or the chance that someone will default on their loan payments. To start we are going to use an even simpler example predicting the price of pizza based on its diameter.
I have made an extensive study of the pizza places in my neighborhood and here is a table of observations of pizza diameters and their price.
Diameter  Price 

6  7 
8  9 
10  13 
14  17.5 
18  18 
Your first task is to put the data into a spreadsheet and make a scatter plot of the diameter versus the price. What you can see pretty easily from this graph is that as the diameter of the pizza goes up, so does the price.
If you were to draw a straight line through the points that came as close as possible to all of them, it would look like this:
The orange line called the trendline or the regression line is our best guess at a line that describes the data. This is important because we can come up with an equation for the line that will allow us to predict the y value (price) for any given x value (diameter). Linear regression is all about finding the best equation for the line.
How do we do that? There are actually several different ways we can come up with the equation for the line. We will look at two different solutions, one is a closed form equation that will work for any problem like this in just two dimensions. The second is a solution that will allow us to generalize the idea of a best fit line to many dimensions!
Recall the equation for a line that you learned in algebra: \(y = mx + b\) What we need to do is to determine values for m and b. One way we can do that is to simply guess! And keep refining our guesses until we get to a point where we are not really getting any better. You may think this sounds kind of stupid, but it is actually a pretty fundamental part of many machine learning algorithms.
You may also be wondering how we decide what it means to “get better”? In the case of our pizza problem we have some data to work with, and so for a given guess for m and b we can compare the calculated y (price) against the known value of y and measure our error. For example: Suppose we guess that b = 5 and that m = .8 for a diameter of 10 we get y = .7 x 10 + 5 or 12. Checking against our table the value should be 13 so our error is our known value minus our predicted value 1312 or 1. If we try the same thing for a diameter of 8 we get y = .7 x 8 + 5 or 10.6 The error here is 9  10.6 or 1.6.
Add a column to the spreadsheet that contains the predicted price for the pizza using the diameter as the x value and using a slope of .7 and intercept of 5.
Now plot the original set of data along with the this new table of data. Make the original one color and your calculated table another color. Experiment with some different guesses for the slope and intercept to see how it works.
Next lets add another column to the table where we include the error. Now we have our ‘predicted values’ and a bunch of error measurements. One common way we combine these error measurements together is to compute the Mean Squared Error (MSE) This is easy to compute because all we have to do is square each of our errors, add them up and then divide by the number of error terms we have. Why do we square them first? Well, did you notice that in our example one of the errors was positive and one was negative, but when we add together both positive and negative numbers they tend to cancel each other out making our final mean value smaller. So we square them to be sure they are all positive. We call this calculation of the MSE an objective function. In many machine learning algorithms our goal is to minimize the objective function. That is what we want to do here, we want to find the value for m and b that minimizes the error.
Add two cells to your spreadsheet where you can try different values for the slope and intercept. Also update the column where you compute a value for the price to use the values from these cells rather than the hardcoded values os .7 and 5.
Q1: Using a slope of 2.5 and an intercept of 0.5 what is the MSE?
Now lets make use of the Solver functionality to search for and find the best values for the slope and intercept. Make sure that you have the Frontline Systems Solver addon installed for Google Sheets. If you haven’t used solver before you may want to take a look at Optimization with Solver. Setting up solver for this problem doesn’t even have any constraints. What we want to do is minimize the MSE value, by changing the values for slope and intercept. Note that because we are squaring the errors this is a nonlinear problem and will require the Standard LSGRG Nonlinear solver. Now set up the solver and run it for the pizza problem.
Q2: Fill in the values Solver found for the slope: and intercept
If you are having any trouble, your setup should look like this.
9.10.1. Closed form Solution¶
The closed form solution to this problem is known to many science students.
slope = \(\frac{\sum{y_i  \bar{y}}}{\sum{(x_i  \bar{x})^2}}\)
intercept = \(\bar{y}  b \bar{x}\)
Lets use the closed form solution to calculate values for the slope and intercept. To do this you will need to calculate a value for \(\bar{x}\) and \(\bar{y}\) that is the average value for both x and y. You can add two columns to do the calculation of \(y_i  \bar{y}\) and \(x_i  \bar{x}\)
Q3: What values do you get for the slope and intercept?
9.10.2. The Payoff  Supervised Learning¶
The payoff from this exercise with Solver is that we have “learned” values for the slope and intercept that will allow us to predict the price of any pizza! If your friend calls you up and says “I just ate a 7 inch pizza, guess how much it cost?” You can quickly do the math of 1.97 + 0.98 x 7 and guess $8.83! Won’t they be amazed!?
In the world of machine learning, using the sample data for pizza along with a solver like algorithm for finding the values for the slope and intercept are called supervised learning. That is because we are using the known values for the prices of different pizzas along with their diameters to help correct our algorithm and come up with a value for the slope and intercept. The values that the learns, are called our model. This model is pretty simple because it just uses two numbers and the formula for a line. But don’t let the simplicity fool you, Regression is one of the most commonly used algorithms in a data scientists arsenal.
In the next section we’ll make a fancier model that uses more data to do a better job of making predictions. If you want to try your hand at writing your own learning algorithm you can do that in the optional section below.
9.10.3. A simple Machine Learning Approach (Optional)¶
To do this we will follow these steps:
 Pick a random value for m and b
 Compute the MSE for all our known points
 Repeat the following steps 1000 times 1. Make m slightly bigger and recompute the MSE does that make it smaller? If so then use this new value for m. If it doesn’t make MSE smaller than make m slightly smaller and see if that helps. 1. Make b slightly bigger and recompute the MSE does that make it smaller? If so then use this new value for b and go back to step 3a. If not then try a slightly smaller b and see if that makes the MSE smaller if so keep this value for b and go back to step 3a.
 After repeating the above enough times we will be very close to the best possible values for m and b. We can now use these values to make predictions for other pizzas where we know the diameter but don’t know the price.
Let’s develop some intuition for this whole thing by writing a function and trying to minimize the error.
You will write three functions compute_y(x, m, b)
, compute_all_y(list_of_x)
This shoudl use compute_y
and compute_mse(list_of_known, list_of_predictions)
Next write a function that systematically tries different values for m and b in order to minimize the MSE. Put this function in a loop for 1000 times and see what your value is for m and b at the end.
Congratulations! You have just written your first “machine learning” algorithm. One fun thing you can do is to save the MSE at the end of each time through the loop then plot it. You should see the error go down pretty quickly and then level off or go down very gradually. Note that the error will ever go to 0 because the data isn’t perfectly linear. Nothing in the real world is!
At this point your algorithms ability to ‘learn’ is limited by how much you change the slope and intercept values each time through the loop. At the beginning its good to change them by a lot but as you get closer to the best answer its better to tweak them by smaller and smaller amounts. Can you adjust your code above to do this?
For two dimensional data there is even a closed form solution to this problem that one could derive using a bit of calculus. It is worthwhile to have the students do this to see that their solution is very very close to the solution you get from a simple formula that slope = covariance / variance and intercept = avg(y)  slope * avg(x). Write a function that will calculate the slope and intercept using this method and compare the slope and intercept with your previous error.
Lesson Feedback

During this lesson I was primarily in my...
 Comfort Zone
 Learning Zone
 Panic Zone

Completing this lesson took...
 Very little time
 A reasonable amount of time
 More time than is reasonable

Based on my own interests and needs, the things taught in this lesson...
 Don't seem worth learning
 May be worth learning
 Are definitely worth learning

For me to master the things taught in this lesson feels...
 Definitely within reach
 Within reach if I try my hardest
 Out of reach no matter how hard I try