19.2. Our first data set: Air pollution in the United States

This first data set that we’re going to explore is from The Guardian newspaper’s data blog. It is a record of the air pollution in various cities. According to the World Health Organization, 2.5 micrometer pollution is particularly deadly, because it more easily gets deep into our lungs. To give a sense of how dangerous this kind of pollution is, an annual mean amount of just 5 µg/m3 (microgram per cubic meter of air) was linked with a 13% increased risk of heart attacks.

We will use just the US data. It is in a large text file that looks like this:

Aberdeen, SD :13 :8
Adrian, MI :15 :9
Akron, OH :18 :11
Albany, GA :18 :11
Albany-Lebanon, OR :14 :8
Albany-Schenectady-Troy, NY :13 :8
Albuquerque, NM :12 :7
Alexandria, LA :20 :12

There are three columns separated by colon (‘:’) characters:

If you want to see all of the data click on the Show button below. Once it appears, you can hide it again by clicking on the Hide button.

