8.5. Searching through a file¶
When you are searching through data in a file, it is a very common pattern to read through a file, ignoring most of the lines and only processing lines which meet a particular condition. We can combine the pattern for reading a file with string methods to build simple search mechanisms.
For example, if we wanted to read a file and only print out lines which started with the prefix “From:”, we could use the string method startswith to select only those lines with the desired prefix:
The output looks great since the only lines we are seeing are those
which start with “From:”, but why are we seeing the extra blank lines?
This is due to that invisible newline character. Each
of the lines ends with a newline, so the
We could use line slicing to print all but the last character, but a simpler approach is to use the rstrip method which strips whitespace from the right side of a string as follows:
As your file processing programs get more complicated, you may want to
structure your search loops using
continue. The basic idea
of the search loop is that you are looking for “interesting” lines and
effectively skipping “uninteresting” lines. And then when we find an
interesting line, we do something with that line.
We can structure the loop to follow the pattern of skipping uninteresting lines as follows:
The output of the program is the same. In English, the uninteresting
lines are those which do not start with “From:”, which we skip using
continue. For the “interesting” lines (i.e., those that
start with “From:”) we perform the processing on those lines.
We can use the
find string method to simulate a text editor
search that finds lines where the search string is anywhere in the line.
find looks for an occurrence of a string within
another string and either returns the position of the string or -1 if
the string was not found, we can write the following loop to show lines
which contain the string “@uct.ac.za” (i.e., they come from the
University of Cape Town in South Africa):