Before you keep reading...
Runestone Academy can only continue if we get support from individuals like you. As a student you are well aware of the high cost of textbooks. Our mission is to provide great books to you for free, but we ask that you consider a $10 donation, more if you can or less if $10 is a burden.
Before you keep reading...
Making great stuff takes time and $$. If you appreciate the book you are reading now and want to keep quality materials free for other students please consider a donation to Runestone Academy. We ask that you consider a $10 donation, but if you can give more thats great, if $10 is too much for your budget we would be happy with whatever you can afford as a show of support.
8.16. Group Work: Reading from CSV Files¶
It is best to use a POGIL approach with the following. In POGIL students work in groups on activities and each member has an assigned role. For more information see https://cspogil.org/Home.
If you work in a group, have only one member of the group fill in the answers on this page. You will be able to share your answers with the group at the bottom of the page.
Students will know and be able to do the following.
Process data in csv files using rstrip, strip, and split.
Read csv data into a nested dictionary.
Total data using a dictionary
Sort data from a dictionary
Modify code that reads from csv files
Fix code that reads from a csv file
8.16.1. Comma-Separated Values (CSV) Files¶
One way that we exchange data is by storing it in comma-separated value (CSV) files. These files have values separated by a symbol, which is often a comma. Each row in the file contains the same type of data.
Look at the data in the file below. It has a date in day-month-year followed by the opening value, high, low, and closing value.
We can write Python code to read the data and find the date with the highest value at the close.
Run the code below to find the date with the highest value at the close.
Remember to remove the end of line character and convert the string values to integers or floating point numbers before comparing them or using them in calculations.
What if you want to find several things from the data? You wouldn’t want to read the data from the file in every function. You could read all the data into a nested dictionary and then pass the dictionary to every function. A nested dictionary is a dictionary that has dictionaries for the values. In this case we can use the date as the key for the outer dictionary and use “open”, “high”, “low” and “close” as the keys for each inner dictionary.
Run the code below to find the date with the highest value at the close and the date with the lowest value at the close.
Create a function,
get_max_close(date_d, year), that takes a nested dictionary
d with the stock data and a two digit
year and returns a tuple with the max close value and date of that max value for the given year.
8.16.2. Comma-Separated Values (CSV) Files with a Header Row¶
Here is another sample example CSV file. It contains the number of passengers (in thousands) for transatlantic air travel for each month for the years 1958 to 1960. The first row is a header that explains the data. The data is from https://people.sc.fsu.edu/~jburkardt/data/csv/csv.html.
We can read the data from the file and store it in a nested dictionary. In this case the outer dictionary will use the month as the key and the inner dictionary will use the years as the keys. It will use the data from the header row for the year keys.
Run the code below. It is supposed to print the nested dictionary and then the total number of passengers (in thousands) for 1958, but there are errors. Fix the errors so that all tests pass.
Fix the code below to work correctly. It should print the month with the highest number of passengers in 1958.
Here is another sample example CSV file. It contains the Oscar winners for Best Actress from 1928 to 2016. It has a header row to explain the data in each column.
We can read the data from the file and store it in a list of dictionaires where the keys in the dictionary are ‘year’, ‘age’, ‘name’, and ‘movie’.
Run the code below. It should read all the data into a list of dictionaries. Then it should create a new dictionary where the key is the age and the value is the number of actresses who won at that age. It should sort the items in the dictionary by the number of winners descending and return the top five tuples. However, some of the movie titles have commas in them. Fix the code to handle this problem and pass the unit tests.
Change the code above to read from the file for the best actor. Are the results different?
If you worked in a group, you can copy the answers from this page to the other group members. Select the group members below and click the button to share the answers.
The Submit Group button will submit the answer for each each question on this page for each member of your group. It also logs you as the official group submitter.