2.5. Challenge: Calculating a Correlation matrix¶
Early on in our exploration of the happiness data, we calculated the correlation between the happiness score and the different factors. Now we have more data and more factors to continue, and we might ask a more general question about which of our columns are correlated with each other. A common way to do this is to build a correlation matrix. The rows and the columns of the correlation matrix represent each of our different factors and the individual values in the cells of the matrix represent the correlation between each pair of factors.
The following table is an example of a correlation matrix. Suppose we have three factors A, B, and C.
A and B have a correlation of 0.7
A and C have a correlation of 0.5
B and C have a correlation of 0.9
A 
B 
C 


A 
1.0 
0.7 
.5 
B 
0.7 
1.0 
.9 
C 
.5 
.9 
1.0 
Notice that the diagonal of this matrix has all values of 1.0. Each factor is perfectly correlated with itself.
In the next steps, you will build a correlation matrix between the various factors of our happiness data. This will challenge your use of the $ in defining ranges as well as your ability to think about having several cells of a spreadsheet work together.
To calculate a correlation matrix we will make use of sheets
INDEX
andCORREL
functions. TheCORREL
function expects two ranges: in our case, two columns of numbers to compute the correlation between. For example, the happiness score and the economy. However, to make our correlation matrix, we need to compute the correlation between all pairs of columns. Using the column letters is more compact so lets write out a few.EE, EF, EG, EH, EI, EJ, EK, EL, FE, FF, FG, FH, FI, FJ, FK, FL, …
Here is a Python snippet that should give you the full idea.
for i in "EFGHIJKL": for j in "EFGHIJKL": print("correlate column ", i, " with column ", j)
The aptly named
INDEX
function is what allows us to do this by writing one clever function and then copying and pasting it to fill out our matrix.
It may take a bit of experimentation to get the indexes and the $ notation correct but you will eventually end up with a matrix where the diagonal is 1. This is a good indicator that you have things right. This link may also be useful for getting this right.
Once you have the numbers, we can make a rough heat map by using conditional formatting. Using some conditional formatting rules, we can change the foreground and background color of the cells. Let’s start by adding a rule that says if the correlation is between 0.75 and 1.0, then color the background green.
You can add other rules to cover different ranges, but you will immediately see which cells we might want to focus on the most.
Q1: What is the correlation between Freedom to Make life choices and Generosity? (round to three decimal places)
Q2: Which two factors have the largest positive correlation (not including the diagonal)? and
Q3: Which two factors have the largest negative correlation?
Lesson Feedback

During this lesson I was primarily in my...
 1. Comfort Zone
 2. Learning Zone
 3. Panic Zone

Completing this lesson took...
 1. Very little time
 2. A reasonable amount of time
 3. More time than is reasonable

Based on my own interests and needs, the things taught in this lesson...
 1. Don't seem worth learning
 2. May be worth learning
 3. Are definitely worth learning

For me to master the things taught in this lesson feels...
 1. Definitely within reach
 2. Within reach if I try my hardest
 3. Out of reach no matter how hard I try