16.1. A Variation: Compare Join¶
The Match Join binary operator chapter was the largest in this book because there are some many different circumstances when it can be used. Thus, it takes some practice to master its use. Hopefully you have been doing some practice with it outside of this book, using another RDBMS. When you are comfortable with exactly matching columns after doing a times, you can now start thinking about cases in which we might use non-exact matching instead.
16.1.1. What can we do after a Times?¶
Recall that the basis of the Match Join operator was to first complete a Times, then follow it with an exact match filter, and a Reduce or Project. One of the Match Join cases that was slightly unsettling and not particularly satisfying is the circumstance of same relation. Here was the example we had from the Match Join chapter:
The slightly unsatisfactory aspect of this example is that the result relation contains an ordered pair of creatures, and even contains the same creature paired with itself. This is a result of the underlying Times operation. To see this more clearly, let’s use a slightly smaller example, by starting with non-person creature as the input relation, then complete a times. Run the following:
In this example, we left out the exact match of the reside_townId for now. Notice how the non-person creatures are each paired with themselves and the pairs of creatures without the same creatureId appear twice, with the creatureId values in both possible orders. What we truly may want is unordered pairs of creatures where the creatureId is not the same.
To see this in a different way, let’s try to visualize each of the entries in the result of the above times operation, looking only at the creatureId values. There are four non-person creatures, and their creatureId values are 7, 8, 9, and 10 respectively. There are three groups of pairs of these creatureId values that result from the times, which are shown in the following matrix:
Here is some SQL where we are projecting to get simply the pairs of creatureId values, as depicted above.
When you run this, note all of the pairs an where they fall in the matrix visualization of them. Now here is that code again below. What where clause can you add to the second SELECT statement to get the yellow boxes only, which depict an unordered pair that does not include a non-person creature paired with itself?
If you get stuck, then:
WHERE C1.creatureId < C2.creatureId
You can also try getting the lower half of the matrix (the green boxes) as an equally valid unordered pair.
Now let’s return to the original Match Join example of each ordered pair of creatures (including person creatures) that live in the same town. Except now let’s use what we have just visualized with the smaller set on non-person creatures to get the following:
Find each unordered pair of creatures who reside in the same town.
Here is the SQL for this query:
This particular result is so much more useful and satisfying when your goal is to try to find potential training partners or mentors for a mentee who live in the same town. This is the power of realizing what you can perform after a times operation.
Here is a long precedence chart for this, where there is an extra Project to eliminate the duplicated same reside_townId.
Because we can have almost any clauses that we want in the filter and we may or may not have the last project/reduce, each of which does change the result, we typically draw out Compare Join in this way, rather than using an operator symbol to compact it. Note that in the following example, we would want to keep each reside_townId and eliminate the Project.
You can change the above SQL slightly (and chart a bit more to remove the Project) to also answer this question:
Find each unordered pair of creatures that do not live in the same town.
16.1.2. Other Circumstances also Apply¶
We showed a same-relation example, but other circumstances also can be used in a similar way. Here are a couple of example result queries that are reasonably precise. Try to devise the chart for them.
Find each same-SkillCode, but not same test_townId as origin_townId, (achievement with its skill data)
Find each creature who aspires to contribute and contributed the same skillCode to a team where their contributed roleName is not the same as their aspired roleName.
The first query is a different base example, following from the straightforward Match Join, or natural join.
The second query is a same base example, because each is in essence a creature-skill pair, in this case referring to contributing to a team contribution as special case of achievement and aspiration to achieve a team contribution. Here is the current data for these two relations:
creatureId |
achId |
skillCode |
roleName |
---|---|---|---|
1 |
19 |
TR4 |
anchor leg |
2 |
18 |
TR4 |
third leg |
4 |
16 |
TR4 |
first leg |
5 |
17 |
TR4 |
second leg |
7 |
14 |
B2 |
pilot |
7 |
22 |
D3 |
first speaker |
9 |
12 |
THR |
right leg |
9 |
15 |
B2 |
brakeman |
9 |
20 |
D3 |
second speaker |
10 |
13 |
THR |
left leg |
13 |
21 |
D3 |
team captain |
creatureId |
skillCode |
roleName |
---|---|---|
1 |
C2 |
bow paddler |
1 |
TR4 |
anchor leg |
5 |
TR4 |
third leg |
6 |
C2 |
stern paddler |
7 |
B2 |
pilot |
7 |
D3 |
first speaker |
9 |
D3 |
second speaker |
13 |
D3 |
team captain |
Here is some SQL code for the second query:
Note that you would not need to include the skillCode twice in this case, but I did it so that you could see that a contribution is for the same skill as aspired, but that the roleName is different.
Here is a place for you to try the first query above for practice.
16.1.3. Queries to try¶
Try creating the precedence charts for these queries. Use a drawing tool such as draw.io.
English Query:
Find each unordered pair of skills that originates in the same town.
Find each unordered pair of achievements that were tested in the same town.