8.16. Group Work: Reading from CSV Files

It is best to use a POGIL approach with the following. In POGIL students work in groups on activities and each member has an assigned role. For more information see https://cspogil.org/Home.

Note

If you work in a group, have only one member of the group fill in the answers on this page. You will be able to share your answers with the group at the bottom of the page.

Learning Objectives

Students will know and be able to do the following.

Content Objectives:

Process Objectives:

8.16.1. Comma-Separated Values (CSV) Files

One way that we exchange data is by storing it in comma-separated value (CSV) files. These files have values separated by a symbol, which is often a comma. Each row in the file contains the same type of data.

Look at the data in the file below by clicking on the “Show” button. It has a date in day-month-year followed by the opening value, high, low, and closing value. The first line is 3-Dec-01,9848.93,10220.78,9651.87,10021.57.

We can write Python code to read the data and find the date with the highest value at the close.

Run the code below to find the date with the highest value at the close.

Note

Remember to remove the end of line character and convert the string values to integers or floating point numbers before comparing them or using them in calculations.

What if you want to find several things from the data? You wouldn’t want to read the data from the file in every function. You could read all the data into a nested dictionary (a dictionary that contains another dictionary inside of it) and then pass the outer dictionary to every function. In this case we can use the date as a key for the outer dictionary and use “open”, “high”, “low” and “close” as the keys for each inner dictionary: {‘3-Dec-01’: {‘open’: 9848.93, ‘high’: 10220.78, ‘low’: 9651.870000000001, ‘close’: 10021.57}, …

Run the code below to find the date with the highest value at the close and the date with the lowest value at the close.

8.16.2. Nested dictionaries

A dictionary can contain or more more dictioaries inside of it. We call this a nested dictionary. For example, {‘3-Dec-01’: {‘open’: 9848.93, ‘high’: 10220.78, ‘low’: 9651.870000000001, ‘close’: 10021.57}} is a nested dictionary. It has a date as a string as the key for the outer dictionary and the inner dictionary has keys of ‘high’, ‘low’, and ‘close’ and a floating point value for each inner key.

Create a function, get_max_close(date_d, year), that takes a nested dictionary d with the stock data and a two digit year and returns a tuple with the max close value and date of that max value for the given year.

8.16.3. Comma-Separated Values (CSV) Files with a Header Row

Click on the “Show” button to see another sample example CSV file. It contains the number of passengers (in thousands) for transatlantic air travel for each month for the years 1958 to 1960. The first row is a header that explains the data: “Month”, “1958”, “1959”, “1960”. The second row starts with a three letter abbreation for the month followed by the number of passengers (in thousands) for 1958, then 1959, and then 1960 such as: “JAN”, 340, 360, 417. The data is from https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html.

We can read the data from the file and store it in a nested dictionary. In this case the outer dictionary will use the month as the key and the inner dictionary will use the years as the keys. It will use the data from the header row for the year keys. The nested dictionary will look like: {‘JAN’: {‘1958’: 340, ‘1959’: 360, ‘1960’: 417}, ‘FEB’: {‘1958’: 318, ‘1959’: 342, ‘1960’: 391} ….

Run the code below. It is supposed to print the nested dictionary and then the total number of passengers (in thousands) for 1958, but there are errors. Fix the errors so that all tests pass.

Fix the code below to work correctly. It should print the month with the highest number of passengers in 1958.

Click on the “Show” button to see another CSV file. It contains the Oscar winners for Best Actress from 1928 to 2016. It has a header row to explain the data in each column: “Index”, “Year”, “Age”, “Name”, “Movie”. The first row of data is: 1, 1928, 22, “Janet Gaynor”, “Seventh Heaven, Street Angel and Sunrise: A Song of Two Humans”.

We can read the data from the file and store it in a list of dictionaires where the keys in the dictionary are ‘year’, ‘age’, ‘name’, and ‘movie’. The first dictionary should be: {‘year’: ‘1928’, ‘age’: ‘22’, ‘name’: ‘Janet Gaynor’, ‘movie’: ‘Seventh Heaven Street Angel and Sunrise: A Song of Two Humans’}.

Run the code below. It should read all the data into a list of dictionaries. Then it should create a new dictionary where the key is the age and the value is the number of actresses who won at that age. It should sort the items in the dictionary by the number of winners descending and return the top five tuples. However, some of the movie titles have commas in them. Fix the code to handle this problem and pass the unit tests.

Change the code above to read from the file for the best actor. This file has a header: “Index”, “Year”, “Age”, “Name”, “Movie”. The first row of data is: 1, 1928, 44, “Emil Jannings”, “The Last Command, The Way of All Flesh”. Are the results different?

If you worked in a group, you can copy the answers from this page to the other group members. Select the group members below and click the button to share the answers.

The Submit Group button will submit the answer for each each question on this page for each member of your group. It also logs you as the official group submitter.

You have attempted of activities on this page