17.4. Filtering

Filtering data is the process of restricting a query to a subset of rows based on some column condition. Maybe we only want to consider trips that start from a particular location. Or rides the have a certain duration.

To filter data with SQL, we use the WHERE keyword. You can append WHERE column_condition to any SQL query, and the result will be filtered only to rows that satisfy the column condition. For example, you might want to look only at bike trips that are at least one hour (3600 seconds):

Try running this and examining the duration column… the records that are returned by the SELECT should now all be ones with a duration of at least 3600.

When writing WHERE clauses, we use the same kinds of comparison operators that we do in Python: ==, !=, >, >=, … Just like in Python, we write numbers without commas like 3600. Also like in Python, if we want to write something that is a string, we have to put quotes around it (either single ' or double " quotes work). To select the records where member_type was Member we would write:

...
WHERE
    member_type == "Member"
    --Use quotes around Member because it is a string

Finally, notice that the WHERE clause is part of the query as the ; does not appear until after the WHERE. If there was a ; at the end of the FROM trip_data, the statement would end there and the WHERE would not be considered to be part of the rest of the query. Try putting a ; after trip_data and running the program. In the modified program, the SELECT is still valid and returns the 3 desired columns, but does it for trips of all durations. Then, SQL tries to interpret WHERE duration >= 3600 on its own, which makes no sense (what does it apply to???) and we get an error about the WHERE.

Warning

For the simple programs we write, there will always just be one query. You should only ever have one ; at the end of your program. You also can just leave off the ; entirely in which case your program is assumed to be all one statement.

It is also possible to filter by multiple criteria by using AND or OR. For example, to look at bike trips that are 60 minutes or more and the member_type is MEMBER, the query would be as below.

Write a query to find the ending station and duration (in that order) of all trips by bike number 'W00153' that lasted over 8 hours (remember that duration is measured in seconds, you will have to figure out how many seconds are in 8 hours).

The bike “number” - ‘W00153’ - is not really a number, it is a string, so make sure to use quotes around it in your query.

You have attempted of activities on this page