11.6. The Most Common Words¶
Coming back to our running example of the text from Romeo and Juliet Act 2, Scene 2, we can write a program using last section’s technique to print the ten most common words in the text as follows:
The first part of the program reads the file and creates/fills in the
dictionary that maps each word to the amount of times it appears in the document.
For this program, instead of simply printing out counts and
ending the program, we construct a list of
tuples and then sort the list in reverse order.
Since the value is first, it will be used for the comparisons. If there is more than one tuple with the same value, it will look at the second element (the key), so tuples whose values are equal will be further sorted in reverse alphabetical order of the key.
At the end, we write a nice
for loop which does a multiple
assignment iteration and prints out the ten most common words by
iterating through a slice of the list (
Now, the output finally looks like what we want for our word frequency analysis.
61 i 42 and 40 romeo 34 to 34 the 32 thou 32 juliet 30 that 29 my 24 thee
The fact that this complex data parsing and analysis can be done with an easy-to-understand Python program is one reason why Python is a good choice as a language for exploring information.
Construct a block of code that uses tuples to keep track of the word count in the file ‘heineken.txt’. Then, print out the 10 most frequently appearing words preceded by how many times they appear.