6.3. Case Study 1: Graphing Infant Mortality on a Map

Let’s take on the seemingly simple task of plotting some of the country data on a map like we did in Google Sheets earlier. We’ll see that this is one area where things are not quite as simple as they are in Sheets. But we can make it work with a bit of effort.

Altair provides us with the facility to make a blank map. But filling in the data requires a bit more work on our part.

This is a good example of learning by example, then extrapolating what you need to do based on understanding the example.

The counties data that is passed to the chart is the data needed to create and outline the map.

import pandas as pd
import altair as alt
from vega_datasets import data
counties = alt.topo_feature(data.us_10m.url, 'counties')
unemp_data = data.unemployment.url


alt.Chart(counties).mark_geoshape().project(
    type='albersUsa').properties(
    width=500,
    height=300
)
Map of the United States divided by counties.

What about our encoding channels?! The primary data needed to draw the map using a mark_geoshape was passed to the Chart, but that is really secondary data for us. What we care about is graphing the unemployment data by county. That is in a different data frame with a column called rate.

With a geoshape, we can encode the county data using color. But, there is no unemployment data in counties, so we have to use a transform_lookup to map from the way counties are identified in the geo data to our DataFrame that contains unemployment data.

unemp_data = pd.read_csv('http://vega.github.io/vega-datasets/data/unemployment.tsv',sep='\t')
unemp_data.head()
id rate
0 1001 0.097
1 1003 0.091
2 1005 0.134
3 1007 0.121
4 1009 0.099

Using the transform_lookup method, we can arrange for the id in the geographic data to be matched against the id in our unemp_data data frame. This allows us to make use of two data frames in one graph. The example below is a bit misleading, in that id is used both as the lookup, as well as the key in the call to LookupData. The lookup value refers to the column name in the DataFrame passed to Chart, whereas the second parameter to the LookupData call is the name of the column in the unemp_data DataFrame. It is just a coincidence that they have the same name in this example.

alt.Chart(counties).mark_geoshape(
).encode(
    color='rate:Q'
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(unemp_data, 'id', ['rate'])
).project(
    type='albersUsa'
).properties(
    width=500,
    height=300,
    title='Unemployment by County'
)
Heat map of the U.S. counties based on relative Unemployment. Colors range from yellow for high unemployment to blue for low employment.

6.3.1. Using a Web API to get Country Codes

Can you make use of the provided example and the Altair documentation to produce a graph of the world where the countries are colored by one of the features in the data?

In this part of the project we will:

  • Learn about using web APIs for data gathering

  • Use a web API to get data that maps country codes to country numbers

  • Learn how to add columns to a data frame using the map function, and possibly learn to use a lambda function if you’ve never used one before

Let’s make a to-do list:

  1. We need to add a column to our wd DataFrame that contains the numerical country id. Where can we get this data? There may be some CSV files with this information already in them, but this is a good chance to learn about a common technique used by data scientists everywhere: web APIs. API stands for Application Programmer Interface. Each website will have its own convention for how you ask it for data, and the format in which the data is returned.

  2. Once we have the new column, we can follow the example from above to make a world map and show birthrate data.

The first step is to make use of the awesome requests module. The requests module allows us to easily communicate to databases across the web. The documentation for it is fantastic, so you should use that to learn about requests in more detail. We’ll just give you the bare bones here to get started.

The website called restcountries.eu provides an interface for us to get data from their site rather than a web page. When thinking about a web API, you have to understand how to ask it for the data you want. In this case, we will use /rest/v2/alpha/XXX. If we unpack that into pieces, let’s look at what it’s telling us.

  • /rest: Technically, REST stands for REpresentational State Transfer. This uses the HTTP protocol to ask for and respond with data.

  • /v2: This is version 2 of this website’s protocol.

  • /alpha: This tells the website that the next thing we are going to pass tell it is the three-letter code for the country.

  • XXX: This can be any valid three-letter country code, for example “usa”.

Open a new tab in your browser and paste this URL: https://restcountries.eu/rest/v2/alpha/usa. You will see that you don’t get a web page in response, but rather some information that looks like a Python dictionary. We’ll explore that more below. We can do the same thing from a Python program using the requests library.

import requests
res = requests.get('https://restcountries.eu/rest/v2/alpha/usa')
res.status_code
200

The status code of 200 tells us that everything went fine. If you make a typo in the URL, you may see the familiar status code of 404, meaning not found.

We can also look at the text that was returned.

res.text
'{"name":"United States of America","topLevelDomain":[".us"],"alpha2Code":"US","alpha3Code":"USA","callingCodes":["1"],"capital":"Washington, D.C.","altSpellings":["US","USA","United States of America"],"region":"Americas","subregion":"Northern America","population":323947000,"latlng":[38.0,-97.0],"demonym":"American","area":9629091.0,"gini":48.0,"timezones":["UTC-12:00","UTC-11:00","UTC-10:00","UTC-09:00","UTC-08:00","UTC-07:00","UTC-06:00","UTC-05:00","UTC-04:00","UTC+10:00","UTC+12:00"],"borders":["CAN","MEX"],"nativeName":"United States","numericCode":"840","currencies":[{"code":"USD","name":"United States dollar","symbol":"$"}],"languages":[{"iso639_1":"en","iso639_2":"eng","name":"English","nativeName":"English"}],"translations":{"de":"Vereinigte Staaten von Amerika","es":"Estados Unidos","fr":"États-Unis","ja":"アメリカ合衆国","it":"Stati Uniti D'America","br":"Estados Unidos","pt":"Estados Unidos","nl":"Verenigde Staten","hr":"Sjedinjene Američke Države","fa":"ایالات متحده آمریکا"},"flag":"https://restcountries.eu/data/usa.svg","regionalBlocs":[{"acronym":"NAFTA","name":"North American Free Trade Agreement","otherAcronyms":[],"otherNames":["Tratado de Libre Comercio de América del Norte","Accord de Libre-échange Nord-Américain"]}],"cioc":"USA"}'

That looks like an ugly mess! Fortunately, it’s not as bad as it seems. If you look closely at the data, you will see that it starts with a { and ends with a }. In fact, you may realize this looks a lot like a Python dictionary! If you thought that, you are correct. This is a big long string that represents a Python dictionary. Better yet, we can convert this string into an actual Python dictionary and then access the individual key-value pairs stored in the dictionary using the usual Python syntax!

The official name for the format that we saw above is called JSON: JavaScript Object Notation. It’s a good acronym to know, but you don’t have to know anything about Javascript in order to make use of JSON. You can think of the results as a Python dictionary. It can be a bit daunting at first as there can be many keys and JSON is often full of dictionaries of dictionaries of lists of dictionaries but fear not, you can figure it out with a bit of experimentation.

usa_info = res.json()
usa_info
{'name': 'United States of America',
 'topLevelDomain': ['.us'],
 'alpha2Code': 'US',
 'alpha3Code': 'USA',
 'callingCodes': ['1'],
 'capital': 'Washington, D.C.',
 'altSpellings': ['US', 'USA', 'United States of America'],
 'region': 'Americas',
 'subregion': 'Northern America',
 'population': 323947000,
 'latlng': [38.0, -97.0],
 'demonym': 'American',
 'area': 9629091.0,
 'gini': 48.0,
 'timezones': ['UTC-12:00',
   'UTC-11:00',
   'UTC-10:00',
   'UTC-09:00',
   'UTC-08:00',
   'UTC-07:00',
   'UTC-06:00',
   'UTC-05:00',
   'UTC-04:00',
   'UTC+10:00',
   'UTC+12:00'],
 'borders': ['CAN', 'MEX'],
 'nativeName': 'United States',
 'numericCode': '840',
 'currencies': [{'code': 'USD',
   'name': 'United States dollar',
   'symbol': '$'}],
 'languages': [{'iso639_1': 'en',
   'iso639_2': 'eng',
   'name': 'English',
   'nativeName': 'English'}],
 'translations': {'de': 'Vereinigte Staaten von Amerika',
   'es': 'Estados Unidos',
   'fr': 'États-Unis',
   'ja': 'アメリカ合衆国',
   'it': "Stati Uniti D'America",
   'br': 'Estados Unidos',
   'pt': 'Estados Unidos',
   'nl': 'Verenigde Staten',
   'hr': 'Sjedinjene Američke Države',
   'fa': 'ایالات متحده آمریکا'},
 'flag': 'https://restcountries.eu/data/usa.svg',
 'regionalBlocs': [{'acronym': 'NAFTA',
   'name': 'North American Free Trade Agreement',
   'otherAcronyms': [],
   'otherNames': ['Tratado de Libre Comercio de América del Norte',
     'Accord de Libre-échange Nord-Américain']}],
 'cioc': 'USA'}

For example, timezones is a top level key, which produces a list of the valid timezones in the USA.

usa_info['timezones']
['UTC-12:00',
 'UTC-11:00',
 'UTC-10:00',
 'UTC-09:00',
 'UTC-08:00',
 'UTC-07:00',
 'UTC-06:00',
 'UTC-05:00',
 'UTC-04:00',
 'UTC+10:00',
 'UTC+12:00']

But, languages is more complicated it also returns a list but each element of the list corresponds to one of the official languages of the country. The USA has only one official language but other countries have more. For example, Malta has both Maltese and English as official languages. Notice that the two dictionaries have an identical structure, a key for the two letter abbreviation, a key for the three letter abbreviation, the name and, the native name.

[{'iso639_1': 'mt',
  'iso639_2': 'mlt',
  'name': 'Maltese',
  'nativeName': 'Malti'},
{'iso639_1': 'en',
  'iso639_2': 'eng',
  'name': 'English',
  'nativeName': 'English'}]

Check Your Understanding

Now that we have a really nice way to get the additional country information, let’s add the numeric country code as a new column in our wd DataFrame. We can think of adding the column as a transformation of our three-letter country code to a number. We can do this using the map function. You learned about map in the Python Review section of this book. If you need to refresh your memory, see here Python Review.

When we use Pandas, the difference is that we don’t pass the list as a parameter to map. map is a method of a Series, so we use the syntax df.myColumn.map(function). This applies the function we pass as a parameter to each element of the series and constructs a brand new series.

For our case, we need to write a function that takes a three-letter country code as a parameter and returns the numeric code we lookup converted to an integer, let’s call it get_num_code. You have all the details you need to write this function. Once you write this function, you can use the code below.

wd['CodeNum'] = wd.Code.map(get_num_code)
wd.head()
Country Ctry Code CodeNum Region Population Area Pop. Density Coastline Net migration ... Phones Arable Crops Other Climate Birthrate Deathrate Agriculture Industry Service
0 Afghanistan Afghanistan AFG 4.0 ASIA (EX. NEAR EAST) 31056997 647500 48.0 0.00 23.06 ... 3.2 12.13 0.22 87.65 1.0 46.60 20.34 0.380 0.240 0.380
1 Albania Albania ALB 8.0 EASTERN EUROPE 3581655 28748 124.6 1.26 -4.93 ... 71.2 21.09 4.42 74.49 3.0 15.11 5.22 0.232 0.188 0.579
2 Algeria Algeria DZA 12.0 NORTHERN AFRICA 32930091 2381740 13.8 0.04 -0.39 ... 78.1 3.22 0.25 96.53 1.0 17.14 4.61 0.101 0.600 0.298
3 American Samoa American Samoa ASM 16.0 OCEANIA 57794 199 290.4 58.29 -20.71 ... 259.5 10.00 15.00 75.00 2.0 22.46 3.27 NaN NaN NaN
4 Andorra Andorra AND 20.0 WESTERN EUROPE 71201 468 152.1 0.00 6.60 ... 497.2 2.22 0.00 97.78 3.0 8.71 6.25 NaN NaN NaN

5 rows × 23 columns

Warning

DataFrame Gotcha

Be careful, wd.CodeNum and wd['CodeNum'] are ALMOST always interchangeable, except for when you create a new column. When you create a new column you MUST use wd['CodeNum'] = blah new column expression. If you write wd.CodeNum = blah new column expression, it will add a CodeNum attribute to the wd object, rather than creating a new column. This is consistent with standard Python syntax of allowing you to add an attribute on the fly to any object.

You can make a gray map of the world like this.

countries = alt.topo_feature(data.world_110m.url, 'countries')

alt.Chart(countries).mark_geoshape(
    fill='#666666',
    stroke='white'
).properties(
    width=750,
    height=450
).project('equirectangular')

So, now you have the information you need to use the example of the counties above and apply that to the world below.

base = alt.Chart(countries).mark_geoshape(
).encode(tooltip='Country:N',
         color=alt.Color('Infant mortality:Q', scale=alt.Scale(scheme="plasma"))
).transform_lookup( # your code here

).properties(
    width=750,
    height=450
).project('equirectangular')

base
Gray colored map of the entire world.

Your final result should look like this.

Heat map of the world mapped by infant mortality. Colors range yello for high mortality to blue for low mortality.

6.3.2. Using a Web API on Your Own

Find a web API that provides some numeric data that interests you. There is tons of data available in the world of finance, sports, environment, travel, etc. A great place to look is at The Programmable Web. Yes, this assignment is a bit vague and open-ended, but that is part of the excitement. You get to find an API and graph some data that appeals to you, not something some author or professor picked out. You might even feel like you have awesome superpowers by the time you finish this project.

  1. Use the web API to obtain the data. Most sites are going to provide it in JSON format similar to what we saw.

  2. Next, create a graph of your using Altair.

  3. Take some time to talk about and present the data and the graph you created to the class.

Lesson Feedback

    During this lesson I was primarily in my...
  • 1. Comfort Zone
  • 2. Learning Zone
  • 3. Panic Zone
    Completing this lesson took...
  • 1. Very little time
  • 2. A reasonable amount of time
  • 3. More time than is reasonable
    Based on my own interests and needs, the things taught in this lesson...
  • 1. Don't seem worth learning
  • 2. May be worth learning
  • 3. Are definitely worth learning
    For me to master the things taught in this lesson feels...
  • 1. Definitely within reach
  • 2. Within reach if I try my hardest
  • 3. Out of reach no matter how hard I try
You have attempted of activities on this page