Ifs and Cases

In SQL, just as in many languages, one of the most basic and useful concepts is the if/else syntax. Just as in Sheets, you can use IF to return different values based on whether a condition is true.

For example, going back to the bikeshare data, suppose you are interested in counting the number of trips that are shorter than one hour and longer than one hour. The duration column contains the duration of the trips in seconds, so you can use this to create a column that tells you whether the duration is over one hour (3600 seconds).

The syntax of the IF statement is IF(condition, value if condition is true, value if condition is false). This should look familar, as it is exactly the same as the syntax for the IF function in Sheets.

Note the use of the TRUE and FALSE keywords, which are also similar in Sheets. You do not need to use these specific keywords, you can use any indicator you like. For example, you could write IF(duration > 3600, "over_one_hour", "not_over_one_hour") or IF(duration > 3600, 1, 0). In this case, the column name is_over_one_hour indicates what this field represents, so it is sufficient to use the inbuilt keywords TRUE and FALSE.

Q-2: Give meaningful names to the columns in the following query and explain what question th query is trying to answer.

SELECT
  IF(duration > 0.5 * 3600, TRUE, FALSE) AS column_1,
  SUM(IF(start_station = 31111, 1, 0)) AS column_2
FROM
  trip_data
GROUP BY
  column_1

Q-3: Write a query, using IF, to count the number of trips of member type Casual for trips that start and end at the same station compared to trips that start and end at different stations. Hint: Your query should have only 2 columns in the SELECT clause and only one WHERE clause.

SQL also provides an even more versatile set of keywords that allow you to have as many “ifs” and “elses” as you like. This is done using the CASE structure. This structure comprises several keywords.

  1. The structure is opened with the CASE keyword. This tells SQL to start looking at each statement and evaluating if it is true or not.

  2. Each new case is identified with the WHEN keyword, followed by a statement. For example, WHEN duration > 3600.

  3. Each case must have a THEN keyword after the WHEN statement. If the statement after the WHEN keyword is true, then the field evaluates to the value after the THEN keyword.

  4. If none of the statements after the WHEN keyword are met, the structure will look for an ELSE keyword. In this case, the field evaluates to whatever follows the ELSE keyword.

  5. The structure is closed with the END keyword. This indicates there are no more WHEN statements.

This is perhaps best illustrated with an example. The following query counts the number of trips for different durations. The durations are grouped manually using the CASE structure.

The CASE structure in the field duration_grouped checks the statements after each WHEN keyword. For the first statement that is met, the field evaluates to whatever follows the corresponding THEN keyword. For example, suppose the duration of a trip is 2468 seconds.

As soon as a condition is met, the field is evaluated, so the column duration_grouped would be "30-60 minutes" for this trip.

Q-5: Explain what the following query is returning.

SELECT
  CASE
    WHEN member_type = 'Casual' AND start_station = end_station THEN "casual_same_station"
    WHEN start_station = end_station THEN "member_same_station"
    WHEN member_type = 'Casual' THEN "casual_different_station"
    ELSE "member_different_station"
  END AS ride_classification,
  AVG(IF(duration > 3600, 1, 0)) AS proportion_trips_over_one_hour
FROM
  trip_data
WHERE
  member_type IN ('Casual', 'Member')
GROUP BY
  ride_classification
You have attempted of activities on this page