10.14. Exercises¶
Below are the datafiles that you have been using so far, and will continue to use for the rest of the chapter.
The file below is travel_plans.txt
.
This summer I will be travelling. I will go to... Italy: Rome Greece: Athens England: London, Manchester France: Paris, Nice, Lyon Spain: Madrid, Barcelona, Granada Austria: Vienna I will probably not even want to come back! However, I wonder how I will get by with all the different languages. I only know English!
The file below is school_prompt.txt
.
Writing essays for school can be difficult but many students find that by researching their topic that they have more to say and are better informed. Here are the university we require many undergraduate students to take a first year writing requirement so that they can have a solid foundation for their writing skills. This comes in handy for many students. Different schools have different requirements, but everyone uses writing at some point in their academic career, be it essays, research papers, technical write ups, or scripts.
The file below is emotion_words.txt
.
Sad upset blue down melancholy somber bitter troubled Angry mad enraged irate irritable wrathful outraged infuriated Happy cheerful content elated joyous delighted lively glad Confused disoriented puzzled perplexed dazed befuddled Excited eager thrilled delighted Scared afraid fearful panicked terrified petrified startled Nervous anxious jittery jumpy tense uneasy apprehensive
-
The following sample file called
studentdata.txt
contains one line for each student in an imaginary class. The students name is the first thing on each line, followed by some exam scores. The number of scores might be different for each student.joe 10 15 20 30 40 bill 23 16 19 22 sue 8 22 17 14 32 17 24 21 2 9 11 17 grace 12 28 21 45 26 10 john 14 32 25 16 89
Using the text file
studentdata.txt
write a program that prints out the names of students that have more than six quiz scores.
-
Create a list called
destination
using the data stored intravel_plans.txt
. Each element of the list should contain a line from the file that lists a country and cities inside that country. Hint: each line that has this information also has a colon:
in it.
-
Create a list called
j_emotions
that contains every word inemotion_words.txt
that begins with the letter ājā.
10.14.1. Contributed Exercises¶
Write code that stores the contents of the months list in a file named months.txt. Store one month per line.
Write the contents of the x
and y
lists to a file called
xy2.dat
with one value from x
and y
on each line, separated by a comma.
Generate 101 evenly spaced values between -5 and 5 and compute
\(x^3\) of each of these values. Write the result to a file
called pow3.csv
with one value of \(x\) and \(x^3\) per
line, separated by a comma. Your first two lines should look like
-5.,-125. -4.9,-117.649
The seasonal average monthly rainfall in inches recorded at the Van
Nuys Airport as of 2019 are provide in the file
van_nuys_seasonal_rainfall.dat
. Read in the data from the file and calculate and
print out the
total,
mean, and
standard deviation,
each on a separate line.
The standard deviation is given by
where \(\bar{x}\) is the mean. You may use the sum
function.
2.44
3.12
1.61
0.69
0.23
0.02
0.03
0.02
0.12
0.55
0.69
1.67
xy.dat
contains two floating point numbers per line, separated
by a comma. Read the contents of xy.dat
and store the first
number of each line in a list called x
and the second in a list
called y
. Values should be stored as floats.
-2.0,3.97 -1.75,2.94 -1.5,2.25 -1.25,1.59 -1.0,0.97 -0.75,0.43 -0.5,0.38 -0.25,-0.13 0.0,0.01 0.25,0.17 0.5,0.13 0.75,0.57 1.0,1.03 1.25,1.51 1.5,2.28 1.75,3.14 2.0,3.97 2.25,5.1 2.5,6.07 2.75,7.56 3.0,8.94
How often does “red” and “scarlet” appear in Sir. Arthur Conan Doyle’s “The Study in Scarlet”. Use the scarlet.txt file to determine and return your values as red_count
and scarlet_count
.
Using altair, plot a histogram of lengths of the words in the words5000.csv file. Your lengths should be saved in a list called list_len
and passed to altair.
Using the code from above, create a csv with the part_o_speech and part_count variables. They data should be comma separated and saved to a file named parts.csv
Using words5000.csv and scarlet.csv . Determine the counts for each part of speech in the story. The counts should be stored in a variable part_count
and the parts of speech should be stored in a variable part_o_speech
, they should be in the order the appear in the word list. Plot the histogram using altair. If the word isn’t in the 5000 word list skip it in the count.
- infile = open(myText.txt, “r”)
- Beware of variable name versus string value. Python will think myText.txt is a variable name here (which by the way is not a valid variable name). Check Note in 10.2
- infile = open("myText.txt", “r”)
- This is correct. We provide a string with file name + "r" which means read only.
- infile = open("myText.txt", “w”)
- Incorrect since "w" denotes writing, not readig.
Q-1: Which of the following commands is used to open a file called myText.txt
in Read-Only mode?
- outfile = open("myText.txt", w)
- w is considered a variable name here. "w" needed.
- outfile = open("myText.txt", “r”)
- opening file in read mode ("r" instead of "w")
- outfile = open(myText.txt, “w”)
- again myText.txt considered as variable name.
- outfile = open("myText.txt", “w”)
- correct.
Q-1: Which of the following commands is used to open a file called myText.txt
in Write-Only mode?
- "myText".close()
- error in Python. String doesn't have function close().
- ref_file.close()
- correct.
- close(ref_file)
- close() must be called on a variable referencing the file.
- close("myText")
- close() must be called on a variable referencing the file.
Q-1: Which command below closes an already open file myText.txt
with ref_file = open("myText.txt", "r")
??
- filevar.append(somestring)
- append() is not used for files, but lists
- filevar.write("somestring")
- this will write "somestring" to the file, and not "my Sentence" as we wanted.
- filevar.write(somestring)
- correct.
- somestring.write()
- string type variable doesn't have a function write().
Q-1: Which of the commands below is used to add a string somestring = "my Sentence"
to the end of the file referenced by filevar
variable.
- I
- This will for each three lines in our file print the file handle/reference
. - II
- This is correct. For each line in names file print THAT line.
- III
- This will print "line" three times. Not what we want.
Q-1: Assume I have a file called names.txt
containing the following:
Peter Pan
Cinderella
Moana
Which of the code snippets below prints all the lines/names in this text file?
I
names = open("names.txt", "r")
for line in names:
print(names)
II
names = open("names.txt", "r")
for line in names:
print(line)
III
names = open("names.txt", "r")
for line in names:
print("line")
(10 points)
Create a function named read_file_contents_so(). This function should take no parameters.
The function must open the file so_survey.csv in read access mode. All of the rows in the file after the first (header) row should be read into a list, one row per list element.
The function must return the list containing the file contents.
Note: We will not be using the so_survey.csv data in this exam, but due to a limitation in Runestone, we cannot read in the correct data from an external file. Calling the read_file_contents_coal() function in other code will return the correct data.
(15 points)
Create a function named process_file_contents(). This function should take no parameters.
The function must call read_file_contents_coal() to obtain a list with the records from the source data about West Virginia coal production. This is a comma-separated file with the following columns:
County name
Tons of coal produced in 1900
Tons of coal produced in 1910
Tons of coal produced in 1920
Tons of coal produced in 1930
Tons of coal produced in 1940
Tons of coal produced in 1950
Tons of coal produced in 1960
Tons of coal produced in 1970
Tons of coal produced in 1980
Tons of coal produced in 1990
Tons of coal produced in 2000
Tons of coal produced in 2010
Iterating over the data obtained from read_file_contents_coal() using a while
loop, construct a nested dictionary. The key of the top-level dictionary should be the name of the county, and its value should be another dictionary. In the second-level dictionary, the key should be the year and the value should be the amount of coal produced. For example, if you name the dictionary coal_dictionary, you should able to access the amount of coal produced in Kanawha County in 1910 by accessing coal_dictionary['Kanawha'][1910]
.
The function must return the entire two-level dictionary.
(15 points)
Write a function named calculate_average_production(). Your function must take one parameter, a dictionary containing coal production of the same format returned by process_file_contents().
Your function must generate a dictionary where the key is the county name and the value is the average number of tons produced in that county.
Your function should return the generated dictionary.
(15 points)
Write a function named calculate_total_production(). Your function must take one parameter, a dictionary containing coal production of the same format returned by process_file_contents().
Your function must generate a dictionary where the key is the county name and the value is the total number of tons of coal produced in that county.
Your function should return the generated dictionary.
(20 points)
Write a function named find_peak_production_year(). Your function must take one parameter, a dictionary containing coal production of the same format returned by process_file_contents().
Your function must generate a dictionary where the key is the county name. The value should be the year in which the most coal was produced in that county. If there are multiple years with the same amount of coal produced, you may store any one of those years as the value.
Your function should return the generated dictionary.
(15 points)
Write a function named print_county_stats(). Your function must take three parameters, dictionaries with the total production, average production, and peak year, in that order.
Your function must go through each county, printing a message for each indicating the county name, the total number of tons of coal produced, the average number produced, and the peak year for mining. Round off the average tons of coal produced so it has no decimal places.
Your function does not need to return anything.
This function does not have unit tests available.
(10 points)
Write code (not a function) to connect the functions we wrote today.
Your code must:
Call process_file_contents() to load the production data into a dictionary.
Call calculate_average_production() to generate a dictionary with average production by county.
Call calculate_total_production() to create a dictionary with total production by county.
Call find_peak_production_year() to obtain a dictionary with the year of peak production for each county.
Call print_county_stats() to output summary data on each county.
This code does not have unit tests available.
In the first Chapter 10 project, you worked with a file of the 5000 most common words in English. (You may also wish to review that project in the text for more info on the fields in that CSV file.)
This exercise is the last exercise in that project. Using altair, let’s look at the distribution of the different parts of speech in this 5000 word dataset. Create a bar chart, where the part of speech is on the x-axis and the number of words on that list which fall into that category is on the y-axis.
(Remember our altair examples handout.)
If you want to check your work, your graph should look something like this graph.
I’ve created a file called exam2_file.txt
(shown above) use python to open the file and create lists called tall
and short
from the supplied file. You should end up with tall=[‘Great Dane’, ‘Tiger’, ‘Giraffe’, ‘Whale Shark’] and short=[‘Weiner Dog’, ‘House Cat’, ‘Okapi’, ‘Dwarf lanternshark’].
Create a text file called data_set.txt
and load the supplied x and y data sets so that the file matches the style below.
x,y 1,2 2,4 3,6 4,8 5,10
Assume that the final grade for a course is determined based on this scale - A: 900+ points, B: 800-899 points, C: 700-799 points, D: 600-699 points, F: 599 or fewer points.
Write a function named get_letter_grade() that takes the number of points the student has earned as a parameter. It should return a string containing (only) the letter grade the student will receive.
Write code to open the file so_survey.csv and read its contents into a list named survey_results. Each entry in the list should correspond with one line from the original file. Ensure the file is closed when you are done reading from it.
Strip whitespace from each element in survey_results, then split the contents of each element by the |
symbol. Construct a new list named split_results that contains the results of the splits.
Read in the file travel_plans.txt
. Then print ONLY the lines from that file that contain a country followed by cities. Do not hard-code. Use the fact that those lines have the character “:”.
The code below creates a deck of cards, and a list with 5 cards in it. Write at least THREE functions that test for any of the following in a hand of cards:
a flush, or all cards having the same suit i.e. all hearts or all clubs,
two cards with the same face value i.e. two sevens or two queens,
three cards with the same face value i.e. three sevens or three queens,
four cards with the same face value i.e. four sevens or four queens,
two pairs, like two queens and two kings
a full house - three cards share a face value, and the other two cards also share a different face value.
A straight, where the five cards have sequential values, i.e. 3, 4, 5, 6, 7.
You must develop your code in runestone. Reminder, to check for a pair, your code must eliminate three or four of a kind. (If you had three queens, you wouldn’t say you had a pair of queens.)
Create a text file called data_set.txt
and load the supplied x and y data sets so that the file matches the style below.
When writing the data to the file, be aware that you don’t want a newline on the last line.
x,y 1,2 2,4 3,6 4,8 5,10
How often does the word “Holmes” appear in Sir. Arthur Conan Doyle’s “The Study in Scarlet”.
Use the scarlet.txt
file to determine and return your value as holmes_count
. Hint: Consider punctuation.
Write a program to determine whether the words in word_list
are found in the file named words5000.csv
. If the word is in the file, set the corresponding boolean in found_list
to True.
Create a list called destination
using the data stored in travel_plans.txt
. Each element of the list should contain a line from the file that lists a country and cities inside that country. Hint: each line that has this information also has a colon :
in it.
Create a list called j_emotions
that contains every word in emotion_words.txt
that begins with the letter “j”.
(5 points) Recall the Stack Overflow developer
survey file (so_survey.csv
) from the project with
which we recently worked.
Write a Python program that reads that file, and then
uses the altair module to produce a bar graph showing the
number of developers who report each level of
satisfaction (e.g. Extremely dissatisfied
,
Moderately satisfied,
etc). Place the satisfaction classifications on the X axis and
the number of developers on the Y axis.
You may wish to re-visit that
Textbook Project (right-click) section in the textbook, to review the format of the so_survey.csv
file.
Our altair examples handout (right-click) may also be useful.
Click the Show Code button to begin.