Correlation versus Causation¶
It is clear that schools with higher median SAT scores also have higher completion rates, but you should be very careful about what conclusions you can draw from that information. Just because two variables have a strong relationship, does not necessarily mean that one can influence the other.
While it may be tempting to conclude that higher average SAT scores lead to higher completion rates, there may be other reasons that the relationship between the two variables is so strong. Perhaps students with higher SAT scores prefer to go to schools with higher completion rates. Alternatively, schools with larger endowments can both be more selective and have more resources to support students towards graduation. This is an example of what we call a lurking (or confounding) variable.
A lurking variable influences both the explanatory and explained variable, which leads to an association even though there is no causal relationship between the two. For example, the size of a school’s endowment can explain both average SAT score and completion rate. Once this lurking variable is found, it no longer makes sense for a college to attempt to increase the average SAT scores of their new students in order to increase completion rate. Instead, it would make more sense to focus on increasing their endowment size.
As another example, the number of ice cream cones sold in a month is highly correlated with the number of sunburns. However, intuitively, you know that ice cream consumption does not cause sunburns, just as you are aware that getting a sunburn doesn’t (directly) make someone eat more ice cream. Both of these variables, ice cream consumption and sunburn prevalence, are higher in warmer months than in cooler months, and in warmer locations than cooler locations. There is a strong association between the number of ice cream cones sold and the frequency of sunburns, but that is not enough to establish causation. In general, correlation does not imply causation: that is, just because two variables share a strong relationship, does not necessarily mean that one causes the other. In this example, temperature is a lurking variable: as temperature increases, both sunburn prevalence and ice cream consumption increase. This article goes into more detail about when you might like to choose one over the other.
It is important to understand the distinction between causation and correlation so that you can draw your own conclusions, since the news often confuses correlation and causation.