# 10.14. Exercises¶

Below are the datafiles that you have been using so far, and will continue to use for the rest of the chapter.

The file below is travel_plans.txt.

This summer I will be travelling.
I will go to...
Italy: Rome
Greece: Athens
England: London, Manchester
France: Paris, Nice, Lyon
Spain: Madrid, Barcelona, Granada
Austria: Vienna
I will probably not even want to come back!
However, I wonder how I will get by with all the different languages.
I only know English!


The file below is school_prompt.txt.

Writing essays for school can be difficult but
many students find that by researching their topic that they
have more to say and are better informed. Here are the university
we require many undergraduate students to take a first year writing requirement
so that they can
have a solid foundation for their writing skills. This comes
in handy for many students.
Different schools have different requirements, but everyone uses
writing at some point in their academic career, be it essays, research papers,
technical write ups, or scripts.


The file below is emotion_words.txt.

Sad upset blue down melancholy somber bitter troubled
Angry mad enraged irate irritable wrathful outraged infuriated
Happy cheerful content elated joyous delighted lively glad
Confused disoriented puzzled perplexed dazed befuddled
Excited eager thrilled delighted
Scared afraid fearful panicked terrified petrified startled
Nervous anxious jittery jumpy tense uneasy apprehensive

1. The following sample file called studentdata.txt contains one line for each student in an imaginary class. The students name is the first thing on each line, followed by some exam scores. The number of scores might be different for each student.

joe 10 15 20 30 40
bill 23 16 19 22
sue 8 22 17 14 32 17 24 21 2 9 11 17
grace 12 28 21 45 26 10
john 14 32 25 16 89


Using the text file studentdata.txt write a program that prints out the names of students that have more than six quiz scores.

1. Create a list called destination using the data stored in travel_plans.txt. Each element of the list should contain a line from the file that lists a country and cities inside that country. Hint: each line that has this information also has a colon : in it.

1. Create a list called j_emotions that contains every word in emotion_words.txt that begins with the letter “j”.

## 10.14.1. Contributed Exercises¶

Write code that stores the contents of the months list in a file named months.txt. Store one month per line.

Write the contents of the x and y lists to a file called xy2.dat with one value from x and y on each line, separated by a comma.

Generate 101 evenly spaced values between -5 and 5 and compute $$x^3$$ of each of these values. Write the result to a file called pow3.csv with one value of $$x$$ and $$x^3$$ per line, separated by a comma. Your first two lines should look like

-5.,-125.
-4.9,-117.649


The seasonal average monthly rainfall in inches recorded at the Van Nuys Airport as of 2019 are provide in the file van_nuys_seasonal_rainfall.dat. Read in the data from the file and calculate and print out the

1. total,

2. mean, and

3. standard deviation,

each on a separate line.

The standard deviation is given by

$\sigma = \sqrt{\frac{1}{N-1}\sum_{i=1}^N (x_i - \bar{x})^2 },$

where $$\bar{x}$$ is the mean. You may use the sum function.

2.44
3.12
1.61
0.69
0.23
0.02
0.03
0.02
0.12
0.55
0.69
1.67


xy.dat contains two floating point numbers per line, separated by a comma. Read the contents of xy.dat and store the first number of each line in a list called x and the second in a list called y. Values should be stored as floats.

-2.0,3.97
-1.75,2.94
-1.5,2.25
-1.25,1.59
-1.0,0.97
-0.75,0.43
-0.5,0.38
-0.25,-0.13
0.0,0.01
0.25,0.17
0.5,0.13
0.75,0.57
1.0,1.03
1.25,1.51
1.5,2.28
1.75,3.14
2.0,3.97
2.25,5.1
2.5,6.07
2.75,7.56
3.0,8.94


How often does “red” and “scarlet” appear in Sir. Arthur Conan Doyle’s “The Study in Scarlet”. Use the scarlet.txt file to determine and return your values as red_count and scarlet_count.

Using altair, plot a histogram of lengths of the words in the words5000.csv file. Your lengths should be saved in a list called list_len and passed to altair.

Using the code from above, create a csv with the part_o_speech and part_count variables. They data should be comma separated and saved to a file named parts.csv

Using words5000.csv and scarlet.csv . Determine the counts for each part of speech in the story. The counts should be stored in a variable part_count and the parts of speech should be stored in a variable part_o_speech, they should be in the order the appear in the word list. Plot the histogram using altair. If the word isn’t in the 5000 word list skip it in the count.

Q-1: Which of the following commands is used to open a file called myText.txt in Read-Only mode?

• infile = open(myText.txt, “r”)
• Beware of variable name versus string value. Python will think myText.txt is a variable name here (which by the way is not a valid variable name). Check Note in 10.2
• infile = open("myText.txt", “r”)
• This is correct. We provide a string with file name + "r" which means read only.
• infile = open("myText.txt", “w”)
• Incorrect since "w" denotes writing, not readig.

Q-1: Which of the following commands is used to open a file called myText.txt in Write-Only mode?

• outfile = open("myText.txt", w)
• w is considered a variable name here. "w" needed.
• outfile = open("myText.txt", “r”)
• opening file in read mode ("r" instead of "w")
• outfile = open(myText.txt, “w”)
• again myText.txt considered as variable name.
• outfile = open("myText.txt", “w”)
• correct.

Q-1: Which command below closes an already open file myText.txt with ref_file = open("myText.txt", "r")??

• "myText".close()
• error in Python. String doesn't have function close().
• ref_file.close()
• correct.
• close(ref_file)
• close() must be called on a variable referencing the file.
• close("myText")
• close() must be called on a variable referencing the file.

Q-1: Which of the commands below is used to add a string somestring = "my Sentence" to the end of the file referenced by filevar variable.

• filevar.append(somestring)
• append() is not used for files, but lists
• filevar.write("somestring")
• this will write "somestring" to the file, and not "my Sentence" as we wanted.
• filevar.write(somestring)
• correct.
• somestring.write()
• string type variable doesn't have a function write().

Q-1: Assume I have a file called names.txt containing the following:

Peter Pan

Cinderella

Moana

Which of the code snippets below prints all the lines/names in this text file?

I

names = open("names.txt", "r")
for line in names:
print(names)


II

names = open("names.txt", "r")
for line in names:
print(line)


III

names = open("names.txt", "r")
for line in names:
print("line")

• I
• This will for each three lines in our file print the file handle/reference .
• II
• This is correct. For each line in names file print THAT line.
• III
• This will print "line" three times. Not what we want.

(10 points)

Create a function named read_file_contents_so(). This function should take no parameters.

The function must open the file so_survey.csv in read access mode. All of the rows in the file after the first (header) row should be read into a list, one row per list element.

The function must return the list containing the file contents.

Note: We will not be using the so_survey.csv data in this exam, but due to a limitation in Runestone, we cannot read in the correct data from an external file. Calling the read_file_contents_coal() function in other code will return the correct data.

(15 points)

Create a function named process_file_contents(). This function should take no parameters.

The function must call read_file_contents_coal() to obtain a list with the records from the source data about West Virginia coal production. This is a comma-separated file with the following columns:

1. County name

2. Tons of coal produced in 1900

3. Tons of coal produced in 1910

4. Tons of coal produced in 1920

5. Tons of coal produced in 1930

6. Tons of coal produced in 1940

7. Tons of coal produced in 1950

8. Tons of coal produced in 1960

9. Tons of coal produced in 1970

10. Tons of coal produced in 1980

11. Tons of coal produced in 1990

12. Tons of coal produced in 2000

13. Tons of coal produced in 2010

Iterating over the data obtained from read_file_contents_coal() using a while loop, construct a nested dictionary. The key of the top-level dictionary should be the name of the county, and its value should be another dictionary. In the second-level dictionary, the key should be the year and the value should be the amount of coal produced. For example, if you name the dictionary coal_dictionary, you should able to access the amount of coal produced in Kanawha County in 1910 by accessing coal_dictionary['Kanawha'][1910].

The function must return the entire two-level dictionary.

(15 points)

Write a function named calculate_average_production(). Your function must take one parameter, a dictionary containing coal production of the same format returned by process_file_contents().

Your function must generate a dictionary where the key is the county name and the value is the average number of tons produced in that county.

Your function should return the generated dictionary.

(15 points)

Write a function named calculate_total_production(). Your function must take one parameter, a dictionary containing coal production of the same format returned by process_file_contents().

Your function must generate a dictionary where the key is the county name and the value is the total number of tons of coal produced in that county.

Your function should return the generated dictionary.

(20 points)

Write a function named find_peak_production_year(). Your function must take one parameter, a dictionary containing coal production of the same format returned by process_file_contents().

Your function must generate a dictionary where the key is the county name. The value should be the year in which the most coal was produced in that county. If there are multiple years with the same amount of coal produced, you may store any one of those years as the value.

Your function should return the generated dictionary.

(15 points)

Write a function named print_county_stats(). Your function must take three parameters, dictionaries with the total production, average production, and peak year, in that order.

Your function must go through each county, printing a message for each indicating the county name, the total number of tons of coal produced, the average number produced, and the peak year for mining. Round off the average tons of coal produced so it has no decimal places.

Your function does not need to return anything.

This function does not have unit tests available.

(10 points)

Write code (not a function) to connect the functions we wrote today.

Your code must:

1. Call process_file_contents() to load the production data into a dictionary.

2. Call calculate_average_production() to generate a dictionary with average production by county.

3. Call calculate_total_production() to create a dictionary with total production by county.

4. Call find_peak_production_year() to obtain a dictionary with the year of peak production for each county.

5. Call print_county_stats() to output summary data on each county.

This code does not have unit tests available.

In the first Chapter 10 project, you worked with a file of the 5000 most common words in English. (You may also wish to review that project in the text for more info on the fields in that CSV file.)

This exercise is the last exercise in that project. Using altair, let’s look at the distribution of the different parts of speech in this 5000 word dataset. Create a bar chart, where the part of speech is on the x-axis and the number of words on that list which fall into that category is on the y-axis.

(Remember our altair examples handout.)

If you want to check your work, your graph should look something like this graph.

I’ve created a file called exam2_file.txt (shown above) use python to open the file and create lists called tall and short from the supplied file. You should end up with tall=[‘Great Dane’, ‘Tiger’, ‘Giraffe’, ‘Whale Shark’] and short=[‘Weiner Dog’, ‘House Cat’, ‘Okapi’, ‘Dwarf lanternshark’].

Create a text file called data_set.txt and load the supplied x and y data sets so that the file matches the style below.

x,y
1,2
2,4
3,6
4,8
5,10


Assume that the final grade for a course is determined based on this scale - A: 900+ points, B: 800-899 points, C: 700-799 points, D: 600-699 points, F: 599 or fewer points.

Write a function named get_letter_grade() that takes the number of points the student has earned as a parameter. It should return a string containing (only) the letter grade the student will receive.

Write code to open the file so_survey.csv and read its contents into a list named survey_results. Each entry in the list should correspond with one line from the original file. Ensure the file is closed when you are done reading from it.

Strip whitespace from each element in survey_results, then split the contents of each element by the | symbol. Construct a new list named split_results that contains the results of the splits.

Read in the file travel_plans.txt. Then print ONLY the lines from that file that contain a country followed by cities. Do not hard-code. Use the fact that those lines have the character “:”.

The code below creates a deck of cards, and a list with 5 cards in it. Write at least THREE functions that test for any of the following in a hand of cards:

1. a flush, or all cards having the same suit i.e. all hearts or all clubs,

2. two cards with the same face value i.e. two sevens or two queens,

3. three cards with the same face value i.e. three sevens or three queens,

4. four cards with the same face value i.e. four sevens or four queens,

5. two pairs, like two queens and two kings

6. a full house - three cards share a face value, and the other two cards also share a different face value.

7. A straight, where the five cards have sequential values, i.e. 3, 4, 5, 6, 7.

You must develop your code in runestone. Reminder, to check for a pair, your code must eliminate three or four of a kind. (If you had three queens, you wouldn’t say you had a pair of queens.)

Create a text file called data_set.txt and load the supplied x and y data sets so that the file matches the style below. When writing the data to the file, be aware that you don’t want a newline on the last line.

x,y
1,2
2,4
3,6
4,8
5,10


How often does the word “Holmes” appear in Sir. Arthur Conan Doyle’s “The Study in Scarlet”. Use the scarlet.txt file to determine and return your value as holmes_count. Hint: Consider punctuation.

Write a program to determine whether the words in word_list are found in the file named words5000.csv. If the word is in the file, set the corresponding boolean in found_list to True.

You have attempted of activities on this page