17.1. Category Relations

In the chapter describing the Divide operator, we introduced the idea of creating a new relation, called the Pattern relation, that would assist us in performing a powerful analysis. In this chapter we will once again use this concept of creating new relations that can help us perform analyses. In this case, we are going to create relations that keep knowledge about the domain of the data in the database as new data of its own. This is a very powerful concept that you can use in many ways once you understand how it is done.

17.1.1. Making missing information new data

In many cases with data in databases, we find ourselves wanting to ask queries like this:

Find each creature who is a fair swimmer.

In this type of query, our client must explain to us just what constitutes fair in swimming. In addition, this query opens up many more very similar queries, asking about poor 2-man bobsledding, good swimming, poor swimming, good debating, excellent penalty kicking, etc. The general concept at work here is that a range of values of proficiency constitutes a category of performance.

What we will show is that we can create a new relation that will codify into data what is meant by poor swimming, fair swimming, good swimming, and other categories of the other skills in this database. Let’s start by first examining the skill relation, which has some information in it about the range of values that proficiency can take on per skill. Noticing in this relation the maxProficiency and minProficiency data values:

Skill

skillCode

skillDescription

maxProficiency

minProficiency

origin_townId

A

float

10

-1

b

B2

2-crew bobsledding

25

0

d

C2

2-person canoeing

12

1

t

D3

Australasia debating

10

1

NULL

E

swim

5

0

b

O

sink

10

-1

b

PK

soccer penalty kick

10

1

le

THR

three-legged race

10

0

g

TR4

4x100 meter track relay

100

0

be

U

walk on water

5

1

d

Z

gargle

5

1

a

Given this information, and in consultation with a client, we might be able to decide that for the ‘swim’ skill, the various categories of swimming achievement might be something like this:

SkillCode

Category

low proficiency value

high proficiency value

E

poor

0

1

E

fair

2

3

E

good

4

5

Because the full range of possible values of swim proficiency in the skill table is 0 to 5, we could develop the above categories, or levels of performance. This assumes that proficiency values are integers and can be interpreted as the following: if an achievement is performed at a proficiency equal to or greater than 0, and less than or equal to 1, the swim achievement is poor.

Different categories can be developed for each skill- we will see another example below.

We can take the above categories and create a new table containing this information as new data.

Once we have this new relation, now we can use Compare Join to match achievements of this particular skill and determine whether the proficiency falls into the range. In this query, we report all swim categories, but it turns out there is only one creature who has achieved swimming.

In this case, there was only one creature who swims, and the proficiency he happens to have achieved makes it fall into the fair category. A far more normal occurrence with large databases is that multiple creatures might fall into multiple categories. We can see a brief glimpse of this if we examine the gargling skill. Here is a very similar distribution of skill categories in another new table:

Note

It is important to see that these category tables are tables until we decide what columns might be sufficient to identify them. In this example and in most cases you might develop, the combination of skillCode and category serves to identify the gargling_category relation.

Now here is the Compare Join between achievement and this new gargling_category, along with a Reduce to two columns (which will identify the new result relation):

What we have now is each gargling creature and the category their proficiency score falls into. Since creatures achieve the same skill more than once, it is likely helpful to add the date of the achievement along with its category, like this:

Important

The above query results in a relation where all three columns identify it. You might be tempted to think that we can simply carry the date as an additional column, but there actually are 2 good gargling achievements by the creature whose creatureId is 3 on exactly the same date in the underlying data. Below is a check of that. Thus we are reducing, making all three columns the identifier of the result relation.

Here is SQL for a table of the creatureId, skillCode, proficiency, and achDate of gargling achievements, so you can see the ones by creatureId 3.

We can go one step further with the category relation and ask specifically about the good gargling creatures.

Notice how we can very easily change this to ask for the fair gargling creatures with their category and date achieved. Here is a precedence chart that shows how if we perform the last filter late, we can reuse the compare join to get either result.


Gargling Creature Category chart

17.1.2. Other Queries to try:

Note for these that you can choose how much to report and you will be able to decide what a score range means for categories of the penalty kick skill. However, for completing the precedence chart similar to that in this chapter, you can assume that you have a relation called “penalty kick category”, for example.

  1. Find each creatureId of Creature who aspires to achieve good swimming.

  2. Find each creatureId of Creature who is {poor, good} at penalty kick (skillCode = ‘PK’).

  3. Find each creature who achieves penalty kick (skillCode = ‘PK’) and what category they fall into.

In case it would help, here are Aspiration and Achievement again:

Aspiration

creatureId

skillCode

aspiredProficiency

desired_townId

1

A

3

a

1

C2

9

b

1

E

4

b

1

TR4

85

sw

1

Z

3

be

2

A

3

NULL

3

A

8

b

3

Z

5

be

4

E

3

g

5

TR4

85

sw

5

Z

10

d

6

C2

9

b

6

Z

3

le

7

B2

20

d

7

D3

8

sw

7

E

3

b

8

O

4

t

9

D3

8

sw

11

PK

10

le

12

PK

10

le

13

D3

8

sw

13

PK

10

le

Achievement

achId

creatureId

skillCode

proficiency

achDate

test_townId

1

1

A

3

2020-07-24 21:37:53

a

2

1

E

3

2017-09-15 15:35:00

d

3

1

A

3

2018-07-14 14:00:00

a

4

1

E

3

2020-07-24 21:37:53

d

5

5

Z

6

2016-04-12 15:42:30

t

6

3

Z

4

2018-07-15 00:00:00

be

7

3

Z

4

2018-07-15 00:00:00

be

8

3

Z

4

2018-07-15 00:00:00

be

9

4

Z

3

2018-06-10 00:00:00

a

10

11

PK

10

1998-08-15 00:00:00

le

11

12

PK

10

2016-05-24 00:00:00

le

12

13

PK

10

2012-08-06 00:00:00

le

13

8

PK

1

NULL

t

14

9

THR

10

2018-08-12 14:30:00

mv

15

10

THR

10

2018-08-12 14:30:00

mv

16

7

B2

19

2017-01-10 16:30:00

d

17

9

B2

19

2017-01-10 16:30:00

d

18

4

TR4

85

2012-07-30 00:00:00

le

19

5

TR4

85

2012-07-30 00:00:00

le

20

2

TR4

85

2012-07-30 00:00:00

le

21

1

TR4

85

2012-07-30 00:00:00

le

22

9

D3

8

2020-07-24 16:37:53

sw

23

13

D3

8

2020-07-24 16:37:53

sw

24

7

D3

8

2020-07-24 16:37:53

sw

Here is a space where you can try to develop the SQL queries, with a start on the swimming categories from above.

You have attempted of activities on this page