This blog post is part of a series on databases. Read Part 1 here.
When it comes to databases, there are generally two schools of thought. There are those people who love them, and there are those people who just hate them. While I wholeheartedly admit that I belong to the former camp of believers, I can understand why someone would subscribe to the latter group. Databases are hard! They’re beautiful and super fun once you understand how to manipulate them, but until you get to that point, they’re pretty much just hard.
I think that a large part of what makes databases hard to understand is the sheer amount of things you can do with the data it contains. Between database migrations, which we unpacked last week, and writing SQL queries that actually do what you want them to do, it’s really easy to just throw your hands up in the air and give up completely. But, you shouldn’t! And I hope that, after reading this post, you won’t! Because databases are beautiful (that should be on a t-shirt somewhere), and you just have to get to know them a little bit.
So, what should we get to know about them? Well, the scariest part, of course: join tables. Join tables are used to combine two sets of data from two different tables. Depending on what you query the database for, different values can be returned. No matter the size of your application, you’re probably going to have at least a few tables, and usually many more. Most of the time, the data in a single table by itself isn’t super useful; when it’s combined with another database’s information, however, then things really get cooking. Join tables are how we get specific information from two different datasets (or two different database tables). There are seven different types of join tables, but there are three in particular that I’ve encountered time and again. Let’s explore the differences between inner joins, left outer joins, and right outer joins.
Know Thy Data
Before we get too deep into join syntax, let’s take a look at our data! We’ll stick with the schema from our eCommerce bookstore application. Since we don’t have any real users yet, we can just populate our database with some fake, super fun seed data.
Disclaimer: I may or may not have gotten carried away when making the seed data. Anyways, here’s what our databases look like:
We’ll be working with the same two tables: a Users
table, and an Orders
table. Each User
instance has a first_name
and a last_name
(and realistically speaking, probably a bunch of other attributes…but let’s keep it simple for now).
Each Order
object belongs_to
a User
, which means it has a user_id
column to store the foreign key from the Users
table. Each Order
also has a unique, randomly-generated number
(think order confirmation number), and an order total
, which is stored as a Ruby BigDecimal object.
Inner Joins
The most common and easiest join table syntax to understand is the inner join. Inner joins are our tool of choice if we’re looking for a specific set of data that only matches the parameters we identify in our query.
For example, say we’re looking for a subset of data that shows only users
with orders
. We’re not interested in any Users
that haven’t placed an order; we only want the data of a User
that actually has placed an order. This is a set of data that might be useful for something like an admin panel that we might want to build out in our next feature, so let’s return just some basic information right now, like the User
’s last_name
and their user_id
.
So, what would that look like in a SQL query? Well, we know that we want to select only the last_name
and the id
of the User
instance. But, we don’t want ALL Users
by their ids
; we actually want only the Users
that are associated with an user_id
on an Order
. So, that’s what we’ll select:
1 2 |
|
Cool, we’re halfway there. Now for the join. Let’s think for a second…we want to join the data in our Users
table with the data in our Orders
table. And, we want to match the rows by ids
. What does that mean, exactly? Well, it means that we only want to return an User
row where that User's
id
exists in a row of the Order
database, where it’ll be under the user_id
foreign key column. We’ll also order our data by the Order's
id
for now, but we’ll change that as we play around a bit more. So let’s add all that to our query:
1 2 3 4 5 |
|
And if we run this query, what’ll happen? We’ll get this fabulousness:
Nice! So, we get five records returned to us, and it returns exactly what we asked for: the User's
last_name
and the Order's
user_id
. But there are some User
instances that are there twice, and one of the Users
(which is apparently T.S. Eliot) doesn’t even show up in our records!
If we take a closer look at our Orders
database table, that’s exactly what we should expect to happen. The User
with an id
of 1
placed two Orders
, so there are two records with his last_name
and id
that are returned to us. And Eliot, who hasn’t placed any Orders
yet, isn’t returned to us at all! That makes sense though, since we’re asking for a subset of information – only the rows that fit the requirements of our query – or, the inner information of our database, and nothing superfluous, meaning that all the outer, extra information is ignored.
Left Outer Joins
Now that we’ve covered the relatively easier inner join, it’s time to get to the slightly trickier stuff: outer joins! Outer joins come in two shapes and sizes: left outer joins and right outer joins. The concept behind the two is fundamentally the same, but they kind of work like opposites. Let’s look at a left outer join to start.
Let’s say that our admins actually want to see all the Users
at once in the admin panel, regardless of whether they’ve placed an order or not. This means that we want to widen our data query to ALL Users
, and all the Orders
associated with any of those Users
. We’ll begin our SQl query the same way as before, selecting the User's
last_name
.
But our admin’s also want to see the Order
totals
for each User
– if the User
has placed an order, obviously. So, we’ll add that to our select statement as well.
1 2 |
|
Now, we want to join our Users
table with our Orders
table again. Except this time, we don’t want to do an inner join. Instead, we want all the data from our Users
table, and any data that might correspond to all of our Users
from the Orders
database. We’ll still do something similar to before and join our tables on the id
columns, but we’ll order by each User's
id
:
1 2 3 4 5 |
|
If we look at our data, we’ll remember that the User
with an id
of 4
never placed any orders! How might this affect what data is returned to us? Let’s run our query and find out:
Whoa! Since our User
T.S. Eliot never placed an order, he didn’t have any row in the Orders
database that corresponded to him. Because there was no Order
total
column that was attributed to his id
, what got returned instead? NULL
!
And that’s exactly what outer joins will do if they don’t find anything! If there is no match on the right table that is being “joined” (in our case, the Orders
database), the default return value will always be NULL
. Now that we know that…how do you think that a right outer join might work?
Right Outer Joins
A right outer join is almost the same as a left outer join. The difference is that the table that you’re joining is on the right. What does that mean for your return set of data?
Well, it means that all the data on the right table will be returned – even if it doesn’t have a corresponding value. Our query might look something like this:
1 2 3 4 5 |
|
Here, we are ordering by the Order's
id
, and we’re only looking at the Order
number
and total
. When we run our query, this is what we’ll get:
In our case, our data doesn’t look super different with a right outer join. In fact, it looks a lot like the inner join we started with! So why do right inner joins matter, then?
Well, what if our admins change their minds (again!) and instead want ONLY orders to be returned. And what if you have an order that doesn’t have a User
, because that user deleted their account? Well, in that case, our Order
would still show up in the returned data, and if we selected the corresponding User
ids
, then the deleted User
would have a NULL
id
value! Pretty cool, right?
See, databases can be pretty awesome once you understand a little bit about how they work! In fact, I love them so much that I give ‘em two thumbs way, way up!
tl;dr?
- Join tables allow us to combine two different sets of data. You can make your data subset be as specific or broad as you want, depending on what you query for!
- An inner join will only return values that exist in both tables.
- A left outer join will return all the values from the left database, and any values that correspond from the right database. A right outer join will return all the values from the right database, and any values that correspond form the left. Any values that don’t have a corresponding row will be returned as
NULL
. - Still curious about inner vs outer joins? Check out this blog post.
- Want to know about ALL the different join tables out there? Here’s a dope graphic that explains it well.