Words and Code

One writer’s journey from words to code.

The Secret Life of Your Database, Part 2: Join Tables


This blog post is part of a series on databases. Read Part 1 here.

When it comes to databases, there are generally two schools of thought. There are those people who love them, and there are those people who just hate them. While I wholeheartedly admit that I belong to the former camp of believers, I can understand why someone would subscribe to the latter group. Databases are hard! They’re beautiful and super fun once you understand how to manipulate them, but until you get to that point, they’re pretty much just hard.

I think that a large part of what makes databases hard to understand is the sheer amount of things you can do with the data it contains. Between database migrations, which we unpacked last week, and writing SQL queries that actually do what you want them to do, it’s really easy to just throw your hands up in the air and give up completely. But, you shouldn’t! And I hope that, after reading this post, you won’t! Because databases are beautiful (that should be on a t-shirt somewhere), and you just have to get to know them a little bit.

So, what should we get to know about them? Well, the scariest part, of course: join tables. Join tables are used to combine two sets of data from two different tables. Depending on what you query the database for, different values can be returned. No matter the size of your application, you’re probably going to have at least a few tables, and usually many more. Most of the time, the data in a single table by itself isn’t super useful; when it’s combined with another database’s information, however, then things really get cooking. Join tables are how we get specific information from two different datasets (or two different database tables). There are seven different types of join tables, but there are three in particular that I’ve encountered time and again. Let’s explore the differences between inner joins, left outer joins, and right outer joins.

Know Thy Data

Before we get too deep into join syntax, let’s take a look at our data! We’ll stick with the schema from our eCommerce bookstore application. Since we don’t have any real users yet, we can just populate our database with some fake, super fun seed data.

Disclaimer: I may or may not have gotten carried away when making the seed data. Anyways, here’s what our databases look like:

We’ll be working with the same two tables: a Users table, and an Orders table. Each User instance has a first_name and a last_name (and realistically speaking, probably a bunch of other attributes…but let’s keep it simple for now).

Each Order object belongs_to a User, which means it has a user_id column to store the foreign key from the Users table. Each Order also has a unique, randomly-generated number (think order confirmation number), and an order total, which is stored as a Ruby BigDecimal object.

Inner Joins

The most common and easiest join table syntax to understand is the inner join. Inner joins are our tool of choice if we’re looking for a specific set of data that only matches the parameters we identify in our query.

For example, say we’re looking for a subset of data that shows only users with orders. We’re not interested in any Users that haven’t placed an order; we only want the data of a User that actually has placed an order. This is a set of data that might be useful for something like an admin panel that we might want to build out in our next feature, so let’s return just some basic information right now, like the User’s last_name and their user_id.

So, what would that look like in a SQL query? Well, we know that we want to select only the last_name and the id of the User instance. But, we don’t want ALL Users by their ids; we actually want only the Users that are associated with an user_id on an Order. So, that’s what we’ll select:

1
2
SELECT users.last_name, orders.user_id
FROM users

Cool, we’re halfway there. Now for the join. Let’s think for a second…we want to join the data in our Users table with the data in our Orders table. And, we want to match the rows by ids. What does that mean, exactly? Well, it means that we only want to return an User row where that User's id exists in a row of the Order database, where it’ll be under the user_id foreign key column. We’ll also order our data by the Order's id for now, but we’ll change that as we play around a bit more. So let’s add all that to our query:

1
2
3
4
5
SELECT users.last_name, orders.user_id
FROM users
INNER JOIN orders
ON users.id=orders.user_id
ORDER BY orders.id;

And if we run this query, what’ll happen? We’ll get this fabulousness:

Nice! So, we get five records returned to us, and it returns exactly what we asked for: the User's last_name and the Order's user_id. But there are some User instances that are there twice, and one of the Users (which is apparently T.S. Eliot) doesn’t even show up in our records!

If we take a closer look at our Orders database table, that’s exactly what we should expect to happen. The User with an id of 1 placed two Orders, so there are two records with his last_name and id that are returned to us. And Eliot, who hasn’t placed any Orders yet, isn’t returned to us at all! That makes sense though, since we’re asking for a subset of information – only the rows that fit the requirements of our query – or, the inner information of our database, and nothing superfluous, meaning that all the outer, extra information is ignored.

Left Outer Joins

Now that we’ve covered the relatively easier inner join, it’s time to get to the slightly trickier stuff: outer joins! Outer joins come in two shapes and sizes: left outer joins and right outer joins. The concept behind the two is fundamentally the same, but they kind of work like opposites. Let’s look at a left outer join to start.

Let’s say that our admins actually want to see all the Users at once in the admin panel, regardless of whether they’ve placed an order or not. This means that we want to widen our data query to ALL Users, and all the Orders associated with any of those Users. We’ll begin our SQl query the same way as before, selecting the User's last_name.

But our admin’s also want to see the Order totals for each User – if the User has placed an order, obviously. So, we’ll add that to our select statement as well.

1
2
SELECT users.last_name, orders.total
FROM users

Now, we want to join our Users table with our Orders table again. Except this time, we don’t want to do an inner join. Instead, we want all the data from our Users table, and any data that might correspond to all of our Users from the Orders database. We’ll still do something similar to before and join our tables on the id columns, but we’ll order by each User's id:

1
2
3
4
5
SELECT users.last_name, orders.total
FROM users
LEFT OUTER JOIN orders
ON users.id=orders.user_id
ORDER BY users.id;

If we look at our data, we’ll remember that the User with an id of 4 never placed any orders! How might this affect what data is returned to us? Let’s run our query and find out:



Whoa! Since our User T.S. Eliot never placed an order, he didn’t have any row in the Orders database that corresponded to him. Because there was no Order total column that was attributed to his id, what got returned instead? NULL!

And that’s exactly what outer joins will do if they don’t find anything! If there is no match on the right table that is being “joined” (in our case, the Orders database), the default return value will always be NULL. Now that we know that…how do you think that a right outer join might work?

Right Outer Joins

A right outer join is almost the same as a left outer join. The difference is that the table that you’re joining is on the right. What does that mean for your return set of data?

Well, it means that all the data on the right table will be returned – even if it doesn’t have a corresponding value. Our query might look something like this:

1
2
3
4
5
SELECT orders.number, orders.total
FROM users
RIGHT OUTER JOIN orders
ON users.id=orders.user_id
ORDER BY orders.id;

Here, we are ordering by the Order's id, and we’re only looking at the Order number and total. When we run our query, this is what we’ll get:



In our case, our data doesn’t look super different with a right outer join. In fact, it looks a lot like the inner join we started with! So why do right inner joins matter, then?

Well, what if our admins change their minds (again!) and instead want ONLY orders to be returned. And what if you have an order that doesn’t have a User, because that user deleted their account? Well, in that case, our Order would still show up in the returned data, and if we selected the corresponding User ids, then the deleted User would have a NULL id value! Pretty cool, right?

See, databases can be pretty awesome once you understand a little bit about how they work! In fact, I love them so much that I give ‘em two thumbs way, way up!

tl;dr?

  • Join tables allow us to combine two different sets of data. You can make your data subset be as specific or broad as you want, depending on what you query for!
  • An inner join will only return values that exist in both tables.
  • A left outer join will return all the values from the left database, and any values that correspond from the right database. A right outer join will return all the values from the right database, and any values that correspond form the left. Any values that don’t have a corresponding row will be returned as NULL.
  • Still curious about inner vs outer joins? Check out this blog post.
  • Want to know about ALL the different join tables out there? Here’s a dope graphic that explains it well.