SQL Relationships

Introduction

We briefly mentioned the idea of rows referencing other rows back when we introduces SQL. We will talk about it in some more detail here.

This idea of relationships between tables is a major cornerstone of relational databases. Without it you simply have structured data, not related structured data. You will end up working with both and it is important to understand how relational data works and why it is used.

What Relationships Solve

So why do we care? Lets go back to the table of BSG characters and the planets they are from.

BSG People and Planets

We could just have the homeworld column contain the text ‘Caprica’ or ‘Gemenon’ and that might work. But what if the name of a planet were to change? We would need to make sure to go back and update every instance of that planet name. Doable, but not awesome.

Another issue is that planets have associated data. They have a capital city, language and population. This is where it gets tricky. If we simply included that information in the row of a person we would basically double the size of our database. What used to be rows of 5 attributes becomes a row of 9 attributes. If we had millions of people in our database that would be a major difference.

Yet another issue is what if multiple things reference a planet? Maybe space ships are based there. Maybe we have a list of episodes and what planets are in them. Now we have 4 or 5 tables all referencing the same planet. It becomes a huge problem if that planets information is updated to figure out all the tables it exists in and update all the references. But if we don’t then we have two versions of the planet data in our database. Some current and some old.

So it should be pretty apparent that relationships help solve a variety of problems. Instead of having a lot of copies of a planet, we store a relationship between a person and a planet by storing the persons homeworld as the primary key of the planet they are from.

Foreign keys

The tool we will be using to do all of this relationship work are foreign keys. We say a table has a foreign key reference to another table. When it does the values of that column containing a foreign key reference must be contained in the column of the table it is referencing or be NULL. If it is some other value the DBMS will reject the operation.

You need to be a little bit careful deleting things. Depending on configuration the DBMS may not let you delete things that are being referenced. If we had people with Caprica as their homeworld, we would not be able to delete the Caprica row until all those references were changed or removed.

One to Many Relationships

This kind of relationship, between people and their home planet is a one-to-many relationship. A person is related to one planet but a planet can be related to many people.

The structure should be fairly intuitive. The thing that references one thing stores the key of the thing it is referencing. The thing being referenced does not need to do anything special in this case.

Many to Many Relationships

Lets take a look at a different kind of relationship, the many-to-many relationship. We can imagine in this universe that people might hold certificates in the military to do different things.

bsg_cert

id title
1 Raptor
2 Viper
3 Mechanic
4 Command

The problem is that people might hold more than one certification. We can’t use the trick we did with people and homeworlds. You might be tempted to add multiple certification columns to your bsg_people table, but that is a recipe for disaster when some some overachiever comes through and gets certified in everything and now you need 20 columns.

bsg_cert_people

cid pid
2 2
4 2
4 3
2 4
4 6
1 7

So we make a new relationship table. This table holds two foreign keys. One referencing a person and one referencing a certification. Each row represents a person holding a particular certification. In this case cid is the primary key of the certificate that is held and pid is the primary key of the person who holds the certificate.

Joins

So now we have all this nice related data, we need a way to put it all together. That is where joins come in. They come in different flavors, we are going to talk about three, the inner, left and right join.

Inner Join

Inner joins are probably the most common type of join. At least it is the variety of join that I use the most in my database interactions. An inner join provides the intersection of two tables. Lets say we want to join these two tables.

id fname lname homeworld age
5 Gaius Baltar 3 NULL
6 Saul Tigh NULL 71
7 Karl Agathon 1 NULL
8 Galen Tyrol 1 32
id name population language capital
1 Gemenon 2800000000 Old Gemenese Oranu
2 Leonis 2600000000 Leonese Luminere
3 Caprica 4900000000 Caprican Caprica City

The command for this would be

SELECT * 
FROM  bsg_people 
INNER JOIN bsg_planets 
ON bsg_people.homeworld = bsg_planets.id;

And it would generate the following table

id fname lname homeworld age id name population language capital
5 Gaius Baltar 3 NULL 3 Caprica 4900000000 Caprican Caprica City
7 Karl Agathon 1 NULL 1 Gemenon 2800000000 Old Gemenese Oranu
8 Galen Tyrol 1 32 1 Gemenon 2800000000 Old Gemenese Oranu

The newlines are just added for clarity, this can be done on a single line or broken up for readability. It is often convention to break it on keywords like is done here.

So lets look at what we added to the SELECT statement to make this happen. We SELECT from columns from a table like usual, in this case we selected everything. We could limit this to specific columns if they are uniquely named by specifying the column name, if they are not unique like id in this case we would need to prefix it with the table it is coming from, like we did with bsg_planets.id. So that is all fairly similar to what we did before except we can select columns from multiple tables if we want.

The INNER JOIN is where things get different. We specify the table we want to join our first table to. In inner joins the order does not really matter.

Finally we specify a join condition. In this case the condition that means the rows go together is if the homeworld attribute of a person is equal to the id of a planet.

This query generates a list of all people who have a homeworld along with the details of their homeworld. We can see that some people in the original table are not included because they don’t have a homeworld. Likewise some planets are not included because no one is from that planet.

Left and Right Joins

Left and right joins are syntactically the same as an inner joins other than the name of the join, but the result sets are different.

SELECT * 
FROM  bsg_people 
LEFT JOIN bsg_planets 
ON bsg_people.homeworld = bsg_planets.id
id fname lname homeworld age id name population language capital
5 Gaius Baltar 3 NULL 3 Caprica 4900000000 Caprican Caprica City
6 Saul Tigh NULL 71 NULL NULL NULL NULL NULL
7 Karl Agathon 1 NULL 1 Gemenon 2800000000 Old Gemenese Oranu
8 Galen Tyrol 1 32 1 Gemenon 2800000000 Old Gemenese Oranu

A left join includes every row from the table on the left (the first table we list) and matching rows on the right. So now we also get people who have no homeworld.

A right join does the same thing but lists all the columns on the right, even if there is not a matching column on the left.

SELECT * 
FROM  bsg_people 
RIGHT JOIN bsg_planets 
ON bsg_people.homeworld = bsg_planets.id
id fname lname homeworld age id name population language capital
7 Karl Agathon 1 NULL 1 Gemenon 2800000000 Old Gemenese Oranu
8 Galen Tyrol 1 32 1 Gemenon 2800000000 Old Gemenese Oranu
NULL NULL NULL NULL NULL 2 Leonis 2600000000 Leonese Luminere
5 Gaius Baltar 3 NULL 3 Caprica 4900000000 Caprican Caprica City

Activity

The syntax can be a bit confusing for joins, but once you get the hang of it there is not a whole lot of variation to it. Start by trying to join the tables we did here. Then if you go to the sakila database you will see lots of tables like film_actor or film_category. Try to see if you can get all of these joined together into a big table, you may need to filter on some of the results to keep the result set a reasonable size, but it is good practice to see if you can string together 3 or 4 tables and keep things organized.