This blog post is part of a series on databases. Read Part 1 here.
When it comes to databases, there are generally two schools of thought. There are those people who love them, and there are those people who just hate them. While I wholeheartedly admit that I belong to the former camp of believers, I can understand why someone would subscribe to the latter group. Databases are hard! They’re beautiful and super fun once you understand how to manipulate them, but until you get to that point, they’re pretty much just hard.
I think that a large part of what makes databases hard to understand is the sheer amount of things you can do with the data it contains. Between database migrations, which we unpacked last week, and writing SQL queries that actually do what you want them to do, it’s really easy to just throw your hands up in the air and give up completely. But, you shouldn’t! And I hope that, after reading this post, you won’t! Because databases are beautiful (that should be on a t-shirt somewhere), and you just have to get to know them a little bit.
So, what should we get to know about them? Well, the scariest part, of course: join tables. Join tables are used to combine two sets of data from two different tables. Depending on what you query the database for, different values can be returned. No matter the size of your application, you’re probably going to have at least a few tables, and usually many more. Most of the time, the data in a single table by itself isn’t super useful; when it’s combined with another database’s information, however, then things really get cooking. Join tables are how we get specific information from two different datasets (or two different database tables). There are seven different types of join tables, but there are three in particular that I’ve encountered time and again. Let’s explore the differences between inner joins, left outer joins, and right outer joins.
Know Thy Data
Before we get too deep into join syntax, let’s take a look at our data! We’ll stick with the schema from our eCommerce bookstore application. Since we don’t have any real users yet, we can just populate our database with some fake, super fun seed data.
Disclaimer: I may or may not have gotten carried away when making the seed data. Anyways, here’s what our databases look like:
We’ll be working with the same two tables: a
Users table, and an
Orders table. Each
User instance has a
first_name and a
last_name (and realistically speaking, probably a bunch of other attributes…but let’s keep it simple for now).
User, which means it has a
user_id column to store the foreign key from the
Users table. Each
Order also has a unique, randomly-generated
number (think order confirmation number), and an order
total, which is stored as a Ruby BigDecimal object.
The most common and easiest join table syntax to understand is the inner join. Inner joins are our tool of choice if we’re looking for a specific set of data that only matches the parameters we identify in our query.
For example, say we’re looking for a subset of data that shows only
orders. We’re not interested in any
Users that haven’t placed an order; we only want the data of a
User that actually has placed an order. This is a set of data that might be useful for something like an admin panel that we might want to build out in our next feature, so let’s return just some basic information right now, like the
last_name and their
So, what would that look like in a SQL query? Well, we know that we want to select only the
last_name and the
id of the
User instance. But, we don’t want ALL
Users by their
ids; we actually want only the
Users that are associated with an
user_id on an
Order. So, that’s what we’ll select:
Cool, we’re halfway there. Now for the join. Let’s think for a second…we want to join the data in our
Users table with the data in our
Orders table. And, we want to match the rows by
ids. What does that mean, exactly? Well, it means that we only want to return an
User row where that
id exists in a row of the
Order database, where it’ll be under the
user_id foreign key column. We’ll also order our data by the
id for now, but we’ll change that as we play around a bit more. So let’s add all that to our query:
1 2 3 4 5
And if we run this query, what’ll happen? We’ll get this fabulousness:
Nice! So, we get five records returned to us, and it returns exactly what we asked for: the
last_name and the
user_id. But there are some
User instances that are there twice, and one of the
Users (which is apparently T.S. Eliot) doesn’t even show up in our records!
If we take a closer look at our
Orders database table, that’s exactly what we should expect to happen. The
User with an
1 placed two
Orders, so there are two records with his
id that are returned to us. And Eliot, who hasn’t placed any
Orders yet, isn’t returned to us at all! That makes sense though, since we’re asking for a subset of information – only the rows that fit the requirements of our query – or, the inner information of our database, and nothing superfluous, meaning that all the outer, extra information is ignored.
Left Outer Joins
Now that we’ve covered the relatively easier inner join, it’s time to get to the slightly trickier stuff: outer joins! Outer joins come in two shapes and sizes: left outer joins and right outer joins. The concept behind the two is fundamentally the same, but they kind of work like opposites. Let’s look at a left outer join to start.
Let’s say that our admins actually want to see all the
Users at once in the admin panel, regardless of whether they’ve placed an order or not. This means that we want to widen our data query to ALL
Users, and all the
Orders associated with any of those
Users. We’ll begin our SQl query the same way as before, selecting the
But our admin’s also want to see the
totals for each
User – if the
User has placed an order, obviously. So, we’ll add that to our select statement as well.
Now, we want to join our
Users table with our
Orders table again. Except this time, we don’t want to do an inner join. Instead, we want all the data from our
Users table, and any data that might correspond to all of our
Users from the
Orders database. We’ll still do something similar to before and join our tables on the
id columns, but we’ll order by each
1 2 3 4 5
If we look at our data, we’ll remember that the
User with an
4 never placed any orders! How might this affect what data is returned to us? Let’s run our query and find out:
Whoa! Since our
User T.S. Eliot never placed an order, he didn’t have any row in the
Orders database that corresponded to him. Because there was no
total column that was attributed to his
id, what got returned instead?
And that’s exactly what outer joins will do if they don’t find anything! If there is no match on the right table that is being “joined” (in our case, the
Orders database), the default return value will always be
NULL. Now that we know that…how do you think that a right outer join might work?
Right Outer Joins
A right outer join is almost the same as a left outer join. The difference is that the table that you’re joining is on the right. What does that mean for your return set of data?
Well, it means that all the data on the right table will be returned – even if it doesn’t have a corresponding value. Our query might look something like this:
1 2 3 4 5
Here, we are ordering by the
id, and we’re only looking at the
total. When we run our query, this is what we’ll get:
In our case, our data doesn’t look super different with a right outer join. In fact, it looks a lot like the inner join we started with! So why do right inner joins matter, then?
Well, what if our admins change their minds (again!) and instead want ONLY orders to be returned. And what if you have an order that doesn’t have a
User, because that user deleted their account? Well, in that case, our
Order would still show up in the returned data, and if we selected the corresponding
ids, then the deleted
User would have a
id value! Pretty cool, right?
See, databases can be pretty awesome once you understand a little bit about how they work! In fact, I love them so much that I give ‘em two thumbs way, way up!
- Join tables allow us to combine two different sets of data. You can make your data subset be as specific or broad as you want, depending on what you query for!
- An inner join will only return values that exist in both tables.
- A left outer join will return all the values from the left database, and any values that correspond from the right database. A right outer join will return all the values from the right database, and any values that correspond form the left. Any values that don’t have a corresponding row will be returned as
- Still curious about inner vs outer joins? Check out this blog post.
- Want to know about ALL the different join tables out there? Here’s a dope graphic that explains it well.