9.5. Group¶

Group is the fourth unary operator. It calculates aggregate values (simple statistics) from its input relation’s cell values. About Group there are four key notions to learn. Read these now, then review them later after you study the examples that follow.

Group has 2 circumstances where a statistic or statistics are calculated by:

Partitioning (or grouping – thus the operator name) the input relation into one set of rows, Grouping over no columns. An example English query for this circumstance is:

“Find the count of all creatures.”
Partitioning the input relation into many sets of rows, Grouping over one or a few columns that have the same value(s). An example English query for this circumstance is:

“For each creatureId, count the number of its achievements”

Every Group has a non-relation input which is a list of aggregate functions with result column names, for example, “achievementCount” for 1.b. above. The aggregate functions specifiable in SQL include: Sum, Count, Minimum, Maximum, Average, Variance, and Standard Deviation. In computing these aggregates, SQL generally ignores NULL values. However, Count has two forms. You can specify Count(*) , or Count(<column-name>). For each group of rows, the former ignores cell values and just counts the number of rows, while the latter counts the number of non-NULL cell values in that column.
Group’s computed columns differ from computed columns in Project. When you compute values with a Project you think horizontally. A Project computed column “numProficiencyValues = maxProficiency - minProficiency” uses values within one row (horizontally related) to compute a new column. You think about scalar variables, constants, and simple arithmetic functions. However, with Group you think vertically, and “achievementCount = count(*)” computes a value for each group of many vertically-related rows. In programming terms, you think about array or vector, not scalar arithmetic.

Important

Group seems “easy.” However, some novices never master it. I think that failure often can be attributed to not learning to think vertically.

Grouping over one or a few columns emerges from Reduce. Look at the relation in the first example SQL result below and cover the computed aggregate column. What remains visible is what would result if you Reduce Achievement to creatureId. You can more quickly master Group if you see a Reduce lurking within it. Because it is like Reduce, Group is dangerous in much the same ways that Reduce was. Because it is like Reduce, the symbol that was use in precedence charts is the same.

9.5.1. Example 1: Group over one column¶

Let’s remember what is in the achievement data relation, since we will try some queries on it.

**Achievement**¶
achId	creatureId	skillCode	proficiency	achDate	test_townId
1	1	A	3	2020-07-24 21:37:53	a
2	1	E	3	2017-09-15 15:35:00	d
3	1	A	3	2018-07-14 14:00:00	a
4	1	E	3	2020-07-24 21:37:53	d
5	5	Z	6	2016-04-12 15:42:30	t
6	3	Z	4	2018-07-15 00:00:00	be
7	3	Z	4	2018-07-15 00:00:00	be
8	3	Z	4	2018-07-15 00:00:00	be
9	4	Z	3	2018-06-10 00:00:00	a
10	11	PK	10	1998-08-15 00:00:00	le
11	12	PK	10	2016-05-24 00:00:00	le
12	13	PK	10	2012-08-06 00:00:00	le
13	8	PK	1	NULL	t
14	9	THR	10	2018-08-12 14:30:00	mv
15	10	THR	10	2018-08-12 14:30:00	mv
16	7	B2	19	2017-01-10 16:30:00	d
17	9	B2	19	2017-01-10 16:30:00	d
18	4	TR4	85	2012-07-30 00:00:00	le
19	5	TR4	85	2012-07-30 00:00:00	le
20	2	TR4	85	2012-07-30 00:00:00	le
21	1	TR4	85	2012-07-30 00:00:00	le
22	9	D3	8	2020-07-24 16:37:53	sw
23	13	D3	8	2020-07-24 16:37:53	sw
24	7	D3	8	2020-07-24 16:37:53	sw

Note

Some creatures, such as that with creatureId=1, have achieved some skills more than once.

English Query:

For each creatureId, find the count of its achieved skills.

Note that this is different than counting all of a particular creature’s achievements, even if the same skill was achieved more than once.

Precedence Chart:

Corresponding SQL:

DROP TABLE IF EXISTS achievement;
CREATE TABLE achievement (
achId              INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT,
creatureId         INTEGER,
skillCode          VARCHAR(3),
proficiency        INTEGER,
achDate            TEXT,
test_townId VARCHAR(3) REFERENCES town(townId),     -- foreign key
FOREIGN KEY (creatureId) REFERENCES creature (creatureId),
FOREIGN KEY (skillCode) REFERENCES skill (skillCode)
);

-- Bannon floats in Anoka (where he aspired)
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (1, 'A', 3, datetime('now'), 'a');

-- Bannon swims in Duluth (he aspired in Bemidji)
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (1, 'E', 3, datetime('2017-09-15 15:35'), 'd');
-- Bannon floats in Anoka (where he aspired)
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (1, 'A', 3, datetime('2018-07-14 14:00'), 'a');

-- Bannon swims in Duluth (he aspired in Bemidji)
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (1, 'E', 3, datetime('now'), 'd');

-- Bannon doesn't gargle
-- Mieska gargles in Tokyo (had no aspiration to)
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (5, 'Z', 6, datetime('2016-04-12 15:42:30'), 't');

-- Neff #3 gargles in Blue Earth (but not to his aspired proficiency)
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (3, 'Z', 4, datetime('2018-07-15'), 'be');
-- Neff #3 gargles in Blue Earth (but not to his aspired proficiency)
-- on same day at same proficiency, signifying need for arbitrary id
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (3, 'Z', 4, datetime('2018-07-15'), 'be');

-- Beckham achieves PK in London
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (11, 'PK', 10, datetime('1998-08-15'), 'le');
-- Kane achieves PK in London
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (12, 'PK', 10, datetime('2016-05-24'), 'le');
-- Rapinoe achieves PK in London
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (13, 'PK', 10, datetime('2012-08-06'), 'le');
-- Godizilla achieves PK in Tokyo poorly with no date
-- had not aspiration to do so- did it on a dare ;)
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (8, 'PK', 1, NULL, 't');

-- -------------------- -------------------- -------------------
-- Thor achieves three-legged race in Metroville (with Elastigirl)
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (9, 'THR', 10, datetime('2018-08-12 14:30'), 'mv');
-- Elastigirl achieves three-legged race in Metroville (with Thor)
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (10, 'THR', 10, datetime('2018-08-12 14:30'), 'mv');

-- Kermit 'pilots' 2-person bobsledding  (pilot goes into contribution)
--       with Thor as brakeman (brakeman goes into contribution) in Duluth,
--    achieve at 76% of maxProficiency
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (7, 'B2', 19, datetime('2017-01-10 16:30'), 'd');
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (9, 'B2', 19, datetime('2017-01-10 16:30'), 'd');

-- 4 people form track realy team in London:
--   Neff #4, Mieska, Myers, Bannon
--    achieve at 85% of maxProficiency
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (4, 'TR4', 85, datetime('2012-07-30'), 'le');
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (5, 'TR4', 85, datetime('2012-07-30'), 'le');
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (2, 'TR4', 85, datetime('2012-07-30'), 'le');
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (1, 'TR4', 85, datetime('2012-07-30'), 'le');

-- Thor, Rapinoe, and Kermit form debate team in Seattle, WA and
-- achieve at 80% of maxProficiency
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (9, 'D3', 8, datetime('now', 'localtime'), 'sw');
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (13, 'D3', 8, datetime('now', 'localtime'), 'sw');
INSERT INTO achievement (creatureId, skillCode, proficiency,
                         achDate, test_townId)
                VALUES (7, 'D3', 8, datetime('now', 'localtime'), 'sw');

The keyword distinct in the query above is important. Notice what happens to creatureId 1’s count if you remove it and run it again. In the latter case, you are simply counting all achievements, regardless of whether the same skill is duplicated because it was achieved twice or more by some creature.

9.5.2. Example 2: Group over multiple columns¶

English Query:

For each creatureId and skillCode, find the count of how many times the creature achieved the skill.

Note that if the same skill was achieved more than once, we can now count how many times that happened.

Precedence Chart:

Corresponding SQL:

Note that we can simply count the number of rows by using an asterisk, ‘*’ inside the function parentheses, as shown in the above example.

9.5.3. Example 3: Group over no columns¶

English Query:

Find the maximum proficiency of any achieved skill.

Note

It is important that you realize that this query returns a single cell value in one row and one column in the result relation.

Precedence Chart:

Note

Think: In this above chart, contemplate why the result relation is named the way it is. In these cases, the base becomes a somewhat long name so that it accurately reflects the single statistic that will be computed.

Corresponding SQL:

Note

If you scroll back up and look at the achievement relation’s data, the only column on which we could compute a function like max, min, sum, or average is proficiency, because it contains an integer value. In some systems you can consider using max and min on date values, but that is highly system dependent.

You can also compute several of the statistics at once, like this:

Precedence Chart:

Note

Think: In this above chart, once again contemplate why the result relation is named the way it is (and that it is different than the previous example). As before, the base has a somewhat long name so that it accurately reflects what will be computed.

You try. Here is the previous query in SQL. You try adding to it to get the minimum and the average proficiency also.

9.5.4. Summary of Group Characteristics¶

As you have seen, the nature of the result relation is quite different between the two kinds of group. Let’s summarize the characteristics of each.

9.5.4.1. Group over no columns¶

For a Group over no columns circumstance: you specify a list of aggregate function – column name pairs; and the entire input relation is treated as one set of rows to aggregate over. The result relation has these notable name and structure characteristics.

Name characteristics:

Its base comes from the specified function columns.
It has a row modifier formed from the input relation’s name.
It has no column modifier.
The result relation’s identifier consists of all of its function columns.

Structure characteristics:

The result relation is one row tall, unless the input relation has zero rows.
It is as wide as the number of function columns specified.

9.5.4.2. Group over one or more columns¶

For a Group over one or a few columns circumstance, you specify, as before, an input relation, and a list of aggregate functions. Now you also specify one or a few columns to Group over. The result relation has these notable name and structure characteristics:

Name characteristics:

Its base comes from the specified Over columns.
It has a row modifier formed from the input relation’s name.
It has a column modifier formed from the function column names.
The result relation’s identifier consists of its Over columns.

Structure characteristics:

The result relation is as tall as the number of distinct rows there are when looking at just the Over columns.
It is as wide as the number of Over and function columns specified.

9.5.5. Bad Cases: Group, like Reduce, is dangerous!¶

There are two ways in which a bad group can arise.

Group with bad Over column. An Over column of a Group cannot have a NULL value in it because such a column is an identifying one in the result relation. For example, Grouping over idol_creatureId in Creature (shown below) yields a table, not a relation.

**Creature**¶
creatureId	creatureName	creatureType	reside_townId	idol_creatureId
1	Bannon	person	p	10
2	Myers	person	a	9
3	Neff	person	be	NULL
4	Neff	person	b	3
5	Mieska	person	d	10
6	Carlis	person	p	9
7	Kermit	frog	g	8
8	Godzilla	monster	t	6
9	Thor	superhero	as	NULL
10	Elastigirl	superhero	mv	13
11	David Beckham	person	le	9
12	Harry Kane	person	le	11
13	Megan Rapinoe	person	sw	10

DROP TABLE IF EXISTS creature;
CREATE TABLE creature (
creatureId          INTEGER      NOT NULL PRIMARY KEY,
creatureName        VARCHAR(20),
creatureType        VARCHAR(20),
reside_townId VARCHAR(3) REFERENCES town(townId),     -- foreign key
idol_creatureId     INTEGER,
FOREIGN KEY(idol_creatureId) REFERENCES creature(creatureId)
);

INSERT INTO creature VALUES (1,'Bannon','person','p',10);
INSERT INTO creature VALUES (2,'Myers','person','a',9);
INSERT INTO creature VALUES (3,'Neff','person','be',NULL);
INSERT INTO creature VALUES (4,'Neff','person','b',3);
INSERT INTO creature VALUES (5,'Mieska','person','d', 10);
INSERT INTO creature VALUES (6,'Carlis','person','p',9);
INSERT INTO creature VALUES (7,'Kermit','frog','g',8);
INSERT INTO creature VALUES (8,'Godzilla','monster','t',6);
INSERT INTO creature VALUES (9,'Thor','superhero','as',NULL);
INSERT INTO creature VALUES (10,'Elastigirl','superhero','mv',13);
INSERT INTO creature VALUES (11,'David Beckham','person','le',9);
INSERT INTO creature VALUES (12,'Harry Kane','person','le',11);
INSERT INTO creature VALUES (13,'Megan Rapinoe','person','sw',10);

Note

Run the above query. Notice how idol_creatureId should be the identifier of the new result, but one of its values is null.

Group with carry columns. For a Group with Over columns, carry columns are dangerous. You can avoid errors if you learn to see a Reduce within a Group. For example, if you look at the Creature relation above, imagine if we Reduce to creatureType. When grouping over creatureType, you might be tempted to want to carry a column like reside_townId, but what should it contain? The following code runs in SQLite, but is a dangerous, misleading result- try it.

Important

Though some systems allow this bad query, you really should avoid it in practice.

Carry columns are also dangerous for a Group with no Over columns. In general, it is once again simply a bad practice to try it.

9.5.6. Exercises¶

When working on precedence charts for these, recognize which ones are grouping over one or more columns and which ones are grouping over nothing. Which one is a potentially bad group?

How many towns are there?

Count each type of creature.

Count achieved skillCode of skills per tested town of achievement.

Count skillCode per originTownId of skill.

Count creatureId per resideTownId of creature.

Find the count of the number of creatures.

Find the count of the number of skills.

Find the highest maximumProficiency of any of the skills.

Find the skillDescription and lowest maximumProficiency of any of the skills.

Once you have a chart, try some SQL for 7 - 9 above. Here is what the Skill relation data looks like. Then below is a place for you to try completing the query. See whether the potentially bad query is indeed bad.

**Skill**¶
skillCode	skillDescription	maxProficiency	minProficiency	origin_townId
A	float	10	-1	b
B2	2-crew bobsledding	25	0	d
C2	2-person canoeing	12	1	t
D3	Australasia debating	10	1	NULL
E	swim	5	0	b
O	sink	10	-1	b
PK	soccer penalty kick	10	1	le
THR	three-legged race	10	0	g
TR4	4x100 meter track relay	100	0	be
U	walk on water	5	1	d
Z	gargle	5	1	a

DROP TABLE IF EXISTS skill;

CREATE TABLE skill (
 skillCode          VARCHAR(3)      NOT NULL PRIMARY KEY,
 skillDescription   VARCHAR(40),
 maxProficiency     INTEGER,     -- max score that can be achieved for this skill
 minProficiency     INTEGER,     -- min score that can be achieved for this skill
 origin_townId      VARCHAR(3)     REFERENCES town(townId)     -- foreign key
 );

INSERT INTO skill VALUES ('A', 'float', 10, -1,'b');
 INSERT INTO skill VALUES ('E', 'swim', 5, 0,'b');
 INSERT INTO skill VALUES ('O', 'sink', 10, -1,'b');
 INSERT INTO skill VALUES ('U', 'walk on water', 5, 1,'d');
 INSERT INTO skill VALUES ('Z', 'gargle', 5, 1,'a');
 INSERT INTO skill VALUES ('B2', '2-crew bobsledding', 25, 0,'d');
 INSERT INTO skill VALUES ('TR4', '4x100 meter track relay', 100, 0,'be');
 INSERT INTO skill VALUES ('C2', '2-person canoeing', 12, 1,'t');
 INSERT INTO skill VALUES ('THR', 'three-legged race', 10, 0,'g');
 INSERT INTO skill VALUES ('D3', 'Australasia debating', 10, 1,NULL);
 INSERT INTO skill VALUES ('PK', 'soccer penalty kick', 10, 1, 'le');

Here is the bad group query, which asks for a carry column, skillDescription. Study the output to see what it is bad and misleading.

Find the skillDescription and lowest maximumProficiency of any of the skills.

You have attempted of activities on this page