In the last chapter, we learned how to aggregate data across all of the records retrieved by a query - to find the last (MAX) date of a ride taken, or the total number (COUNT) of rides taken at a particular station. As a refresher, here are the common aggregations:
COUNTcounts the rows. We can also do COUNT(DISTINCT column_name) to get the number of different value that appear in a column.
SUMadds the values of a numeric column
MINcalculates the minimum of a numeric column
MAXcalculates the maximum of a numeric column
AVGcalculates the mean of a numeric column
MEDIANcalculates the median (middle value) of a numeric column
MODEcalculates the mode (most common value) of a numeric column
Sometimes while aggregating data, you want to aggregate records in groups. Maybe you want to find the longest trip that was taken at each individual station. Or maybe you want to find the number of trips taken on each bike.
GROUP BY statement allows you to group data as you aggregate it.
The following query counts the number of trips (
COUNT(*)) for each member
GROUP BY member_type).
The result of this query has one rows for each distinct
member_type, and the
n_trips counts the number of rows for each type. Recall that
AS _______ gives a name to a calculated value - any time we do an aggregation,
it will be helpful to name the result so we know what we are seeing in the output.
By giving the aggregated value a name, we can use it in the
ORDER BY to sort
by the total number of trips we calculated.
Once you start grouping records, it only makes sense to talk about aggregated data.
You can’t group the records by
member_type and try to select the
There is no one start date for all of those records. You would instead have to ask
You must include the
GROUP BY column(s) in your
All columns that are not in the
GROUP BY clause must be selected using an
The query below calculates the minimum and maximum trip duration
for each start station. Because we are grouping on
start_station. All the other columns we are selecting data from (
which is selected from twice) have an aggregation function (
You can still use other columns that you are not selecting to filter the data that
gets used to calculate the aggregated value. You just need to make sure to put the
WHERE before teh
GROUP BY. For example, you can calculate the number
of trips started from each station for just the bike ‘W000001’ with this query:
The following code will calculate the mean trip duration for trips by each member type and sort the results so that the member type with longest average trip is displayed first.
Put the instructions in the right order and indentation. You will use all of the blocks.