9.4. Reduce

Reduce is a unary operator that, like Project, is column oriented. Unlike Project, Reduce produces a result relation with a different base and identifier than its input relation’s base and identifier. With Reduce you specify one or two lists of column names as inputs:

A Reduce’s result relation has these notable name and structure characteristics:

Name characteristics:

Structure characteristics:

Important

The first bullet is important. Unlike for Project where the bases and identifiers must be the same, for Reduce they must differ. Some novices never distinguish between Project and Reduce, and so fail to progress towards mastery.

9.4.1. Example

We will continue using our old friend, the creature relation:

Creature

creatureId

creatureName

creatureType

reside_townId

idol_creatureId

1

Bannon

person

p

10

2

Myers

person

a

9

3

Neff

person

be

NULL

4

Neff

person

b

3

5

Mieska

person

d

10

6

Carlis

person

p

9

7

Kermit

frog

g

8

8

Godzilla

monster

t

6

9

Thor

superhero

as

NULL

10

Elastigirl

superhero

mv

13

11

David Beckham

person

le

9

12

Harry Kane

person

le

11

13

Megan Rapinoe

person

sw

10

English Query:

Find the creatureType of each Creature.

The corresponding chart for this looks like this:


../_images/ReduceCreature.png

Note what is shown in the operator: the new identifying column (can be columns) is creatureType, no additional columns are carried along in the result (because nothing is written after the word carry), and no additional computed columns are needed.

Also note how we changed the base noun of the result relation. Deciding what this should be can be tricky. You need to think carefully about what is now identifying the resulting relation. Using the phrase ‘of creature’ is important so that you know where the column value came from. Such precision seems overkill in simple examples, but is invaluable in real situations with dozens or hundreds of raw and result relations.

An alternate name for the result relation is “creatureType of one or more Creatures.” The latter name is longer, but it makes explicit what the former only implies.

Corresponding SQL:

Explanation of the SQL queries:

The query in the first tab displays a correct result relation. Why 4 rows? The reason is that in Creature there are 13 creatureType values, but there are only 4 distinct ones, and Reduce should remove duplicate rows automatically. It must do so because, if it did not, as in the table (not relation) shown in the second tab, then the result could not be a relation, because it violates the rules for identifier – more than one row contains “person.” You might imagine that Reduce using the distinct keyword in SQL first sorts the rows into like-valued bunches, then works on each bunch of perhaps- many duplicate rows and put one copy in the result relation.

Thus, in the first tab, the keyword DISTINCT is absolutely necessary. The columns we wish to keep in the result are in a comma-separated list after the keywords SELECT DISTINCT. The input relation, in this case creature, is shown after the keyword FROM.

9.4.2. Let’s think about this operator

Many times the result is shorter than the original input after removing duplicates, but not always. Let’s look at the data for the Contribution relation:

Contribution

creatureId

achId

skillCode

roleName

1

19

TR4

anchor leg

2

18

TR4

third leg

4

16

TR4

first leg

5

17

TR4

second leg

7

14

B2

pilot

7

22

D3

first speaker

9

12

THR

right leg

9

15

B2

brakeman

9

20

D3

second speaker

10

13

THR

left leg

13

21

D3

team captain

The nature of this data at this particular time would given us a result with the same height if we asked this:

Find each roleName of Contribution.

Make sure you see why this is- this can happen.

9.4.3. Comparing Reduce to Project

Look at these two precedence charts as a means to solidify the difference between the Project and Reduce operators.


../_images/ReducevsProject.png

Note how the base of the result relation changes with reduce, along with what identifies it. We use a different symbol for the reduce operator so that we see this more readily. Note how each of them results in a relation with skills in them, represented by their skillCode. But the one on the left is all the skills, and the name of the one on the right needed to be changed to make it clear that it only contains achieved skills.

9.4.4. Naming the result relation

Here is another reduce query:

Find each creatureName of one or more creatures

Draw this one for yourself.You might have been tempted to keep Creature as its base but cannot because creatureName is not Creature’s identifier.

Important

Seeing the difference between bases is necessary for mastery. Beware, some students skim this crucial notion to get to the snazzier operators and never grasp the fundamental importance of choosing precise names. You need to become sensitive to language and, in particular, to internalize that Reduce yields a different base than the input relation, while Project yields the same base.

9.4.5. Bad Case: Reduce is dangerous!

Consider what happens if you Reduce a relation and specify an identifying column with a NULL value in it. Let’s use the town relation from our small database as an example. Here is what is currently in it.

Town

townId

townName

State

Country

townNickname

townMotto

a

Anoka

MN

United States

Halloween Capital of the world

NULL

as

Asgard

NULL

NULL

Home of Odin’s vault

Where magic and science are one in the same

b

Bemidji

MN

United States

B-town

The first city on the Mississippi

be

Blue Earth

MN

United States

Beyond the Valley of the Jolly Green Giant

Earth so rich the city grows!

d

Duluth

MN

United States

Zenith City

NULL

g

Greenville

MS

United States

The Heart & Soul of the Delta

The Best Food, Shopping, & Entertainment In The South

le

London

England

United Kingdom

The Smoke

Domine dirige nos

mv

Metroville

NULL

NULL

Home of the Incredibles

Still Standing

p

Philadelphia

PA

United States

Philly

Let brotherly love endure

sw

Seattle

Washington

United States

The Emerald City

The City of Goodwill

t

Tokyo

Kanto

Japan

NULL

NULL

The following query and its chart cannot result in a relation (so it is drawn with a rounded rectangle to represent a table). When you try the SQL code, you get an SQL table– note the row with null values.

English query:

Find each state and country of town.


../_images/BadReduce.png

9.4.6. Bad Case: Computed columns in Reduce are dangerous.

A Reduce, like a Project, can compute columns, although this is a convenience, not a necessity, because a Project preceding the Reduce can serve to compute columns. Such computed columns are dangerous, since they could have NULL, or non-singleton values. It is best to avoid this practice unless you are absolutely certain neither of these is the case. For most data, you never can be sure of that some value somewhere is NULL or that when you reduced you eliminated some duplicates, rendering a computation in error.

9.4.7. Exercises

Work on precedence charts for these. Decide which of these is bad. Note that for each one of these, there will be one input relation and one result relation from the Reduce operator.

Beware: though you may be tempted to just write SQL, show some restraint and simply draw the charts. To help with this, try making a copy of this drawio operator template in your browser. To do this, you will need to access this Google Drive file and make a copy in your own Google Drive space. You can use it in diagrams.net.

  1. Find each country of town.

  2. Find each creatureName of creature.

  3. Find each creatureId of creature who has achieved.

  4. Find each achieved skillCode.

  5. Find each Achieved skill of creature with its proficiency (id: skillCode, creatureId).

If you haven’t yet, you can download This Small Creature Database Schema and keep a copy for your own reference as you work on the precedence charts.

You have attempted of activities on this page