18.1. Aggregating Data¶
Say we want data about how many rides lasted longer than an hour. At this point we
don’t care about where they started, or which bike they were on, we just want to
know how many such rides there were.
To ask for the total number of records a query finds, we can select COUNT(*)
.
The AS
keyword is used after COUNT(*)
to give the data that is returned a name. If we
leave it off, the “column” of data will just be labeled COUNT(*)
. Calling it number_rides
makes it much clearer what we are looking at.
COUNT
is one of the aggregation functions provided by SQL. Aggregation is the process
of combining data and COUNT
combines all the records and tells us how many there are.
But there are other ways we can aggregate data with SQL:
COUNT
counts the rowsSUM
adds the values of a numeric columnMIN
calculates the minimum of a numeric columnMAX
calculates the maximum of a numeric columnAVG
calculates the mean of a numeric columnMEDIAN
calculates the median (middle value) of a numeric columnMODE
calculates the mode (most common value) of a numeric column
For example, the query below calculates the minimum and maximum trip duration:
You use WHERE
filtering with aggregating functions. The
query below calculates the longest trip duration just for trips made by Casual
member type users.
18.1.1. Counting Unique Values¶
It is often helpful to not only count the number of rows, but count the number
of unique values of a column. You can do this using the DISTINCT
keyword.
To count the distinct values of a column, you can simply use COUNT
along
with DISTINCT
. For example, the query below counts the number of bike
numbers used.