Words and Code

One writer’s journey from words to code.

Querying at Warp Drive: Using ActiveRecord Includes

#technicaltuesdays, computer science, rails, refactoring

There comes a time in the life of every developer when you stop thinking about whether your code works. Instead, you concern yourself with something else entirely: how efficiently your code works.

In fact, you can trace this back to Kent Beck’s philosophy of “Make it work, make it right, make it fast.” Once you’ve got your code doing what you want it to do, you need to make it better. But what do you do with your improved code once you’ve set it right? Make sure that it’s working as fast as it possibly can, of course!

Code efficiency is a bit of an advanced topic, and I’m definitely no expert in it. But I recently learned a pretty awesome querying method that’s neither difficult to understand nor too complex to implement. In fact, this method has always been right under your nose, hidden inside of the magic that is ActiveRecord. So what is this magical method, exactly? Well, it’s called includes, and once you start using it, you’ll never make database queries the same way again.

Inquiries on Querying

The best way to know how to make your code run at warp drive is by first figuring out how fast it is to start with. But before we get into the issue of speed, let’s check out what our code base looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Order < ActiveRecord::Base
  belongs_to :user
  attr_accessor :details
end


class User < ActiveRecord::Base
  has_many :orders

  class << self
    def all_order_details
      User.all.each |user|
        user.orders.each do |order|
          order.details
        end
      end
    end
  end
end

Nothing too fancy happening here. We’ve got a one-to-many relationship here, with a User object having many Orders, and each Order belonging to a User. There’s also a class method called all_order_details in our User class. This method queries for all of the User objects in our database, and loops through each User instance, grabbing each order, and calling details on it. And if we take a look at our Order class, we’ll see that details is just an attribute on Order. (Forget what that weird class << self syntax means? Jog your memory with this post from a few weeks ago.)

Our plan is to use this method in an admin panel or dashboard, which is probably the only place where someone would want to see all the order details of every single order placed by every single user in our system. But even though we’re not going to be using this method within all of our views, it’s still important that we take a look at what it’s doing in order to understand whether we can refactor it and make it faster.

So, this code does pretty much exactly what we want it to do. But how fast does it do its job? The only way to find out is by checking out what happens in our server logs when we call our method:

1
2
3
4
5
6
7
8
9
10
11
12
13
2.2.0 :001 > User.all_order_details
  User Load (1.1ms)  SELECT "users".* FROM "users"
  Order Load (0.6ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = ?  [["user_id", 1]]
  Order Load (0.1ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = ?  [["user_id", 2]]
  Order Load (0.1ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = ?  [["user_id", 3]]
  Order Load (0.1ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = ?  [["user_id", 4]]
  Order Load (0.1ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = ?  [["user_id", 5]]
  Order Load (0.1ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = ?  [["user_id", 6]]
  Order Load (0.1ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = ?  [["user_id", 7]]
  Order Load (0.1ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = ?  [["user_id", 8]]
  Order Load (0.1ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = ?  [["user_id", 9]]
  Order Load (0.1ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = ?  [["user_id", 10]]
 => # returns an array of order details

Whoa, whoa, whoa – hold up. We’ve got a problem.

Sure, we got an array of Order details, which is great. But doesn’t it seem kind of weird that we had to make so many queries? Are the red flags going off in your head, too? Time to investigate.

The N + 1 Problem

Okay, so maybe you didn’t recoil in horror when you saw all those queries. But let me explain why you probably should. In our database, we currently have 10 Users, each of whom has one Order. If we look at the queries we’re currently making, you’ll notice that we’re first loading all of our User objects:

1
User Load (1.1ms)  SELECT "users".* FROM "users"

Loading all of our Users takes 1.1 milliseconds. But what about our Order details? We need to go through our Users table in order to grab each Order’s details. That’s exactly what’s happening in each one of the queries that looks like this:

1
2
3
Order Load (0.6ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = ?  [["user_id", 1]]
Order Load (0.1ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" = ?  [["user_id", 2]]
# more queries for each user's orders

We’re first loading all the Orders when we look for a User with an id of 1, which takes us 0.6 milliseconds. And as we continue querying through the remaining 9 Users in our database, each query takes us 0.1 milliseconds.

Right now, this might not seem like that big of a deal. But we have to keep in mind that we only have ten users so far! Hopefully, we’ll have a lot more as our user base grows. And that’s where our problem comes in. Actually, our problem is quite common, and it’s even got a name: the n + 1 problem.

The crux of the n + 1 problem is essentially this: the number of queries you make will directly correspond to the number of objects (n) that you have in your database. This means that we’ll have 1 query to get all of our User objects, and n number of additional queries for each User object we want to get order details from.

With only 10 users, we’ll make 11 queries in total: 1 query for our users, and 10 for each user we query to load the user’s order details. But what if we have 10,000 users? Or if we’re being optimistic, 100,000 user? …Or even a million? Making over a hundred thousand queries is just way too slow and simply not acceptable.

Thankfully, this problem is so common that Rails has a fantastic solution, tailor-made to handle it.

Smarter Querying

Rails is a pretty smart cookie – it allows you to load a bunch of stuff at once ahead of time if you know exactly what you want to query. This is referred to as eager loading, and a really easy way to eager load your associations is by using the ActiveRecord method includes.

You can call includes on an object an pass it a parameter of the table you want to load as a symbol name. We can implement this directly on our all_order_details method:

1
2
3
4
5
6
7
def all_order_details
  User.all.includes(:orders).each |user|
    user.orders.each do |order|
      order.details
    end
  end
end

This may look pretty similar to our original code at first, but when we look at our server logs, we can see a world of difference:

1
2
3
4
2.2.0 :002 > User.all_order_details
  User Load (0.2ms)  SELECT "users".* FROM "users"
  Order Load (0.3ms)  SELECT "orders".* FROM "orders" WHERE "orders"."user_id" IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
 => # returns an array of order details

Nice. We have something really cool happening here. Instead of making “n + 1” number of queries, we’re only making two queries. We spend 0.2 milliseconds getting our users, and 0.3 milliseconds loading our orders. We’re only spending 0.5 milliseconds compared to the 2.6 milliseconds we were spending before.

How does this work, exactly? Eager loading allows you to specify exactly which tables you want to load into memory. Just by loading them, we’re reducing the number of queries we’re making. It’s like telling ActiveRecord, Hey, I already know I want all the Users and I know I want to do something with their associated Orders. So instead of querying the table one instance at a time, why don’t you just be super efficient and grab ‘em all for me in one shot? And that’s pretty much exactly what Rails will do.

I also like the way that the Odin Project explains how includes works:

“Includes basically takes the name of one or more associations that you’d like to load at the same time as your original object and brings them into memory. You can chain it onto other methods like where or order clauses.”

Other helpful methods that do similar things to solve the n + 1 problem are joins, used for joining multiple tables together, and pluck, which grabs a specific column and returns it in an array format.

Big O Puts the O In “Uh Oh”

Dealing with issues of efficiency and speed is a pretty advanced topic, mostly because it gets into the deep depths of computer science and the dreaded concept of Big O Notation. Big O deals with the space time complexity of different algorithms.

The n + 1 problem is actually nothing more than a Big O problem, because it has to do with how well your algorithm runs. The Big O complexity of an n + 1 problem is O(n), pronounced “Oh of n”. An O(n) algorithm is one whose performance will grow linearly, and will be directly proportional to the size of input data (the more Users we have, the larger our query of each user’s order details).

Big O should always be taken into consideration while building an app. But that being said, it should never be the first thing that you think about – particularly if you are a reatively junior developer. The space time complexity of your algorithm is something you can think about while refactoring your code, or rethinking how you’re approaching or solving a problem.

Being able to evaluate an algorithm’s efficiency is definitely something to work towards, but it’s also something that takes practice. Implementing ActiveRecord’s includes method is a good way to start playing around with speed and efficiency – plus, you’ll make some kickass queries in the process, and that always feels pretty awesome. (I mean, obviously it’s not as awesome as going warp speed on the Enterprise, but hey, it’s an acceptable alternative.)

tl;dr?

  • The includes method can be chained on to any ActiveRecord model, and takes parameters of an association/table name, passed in as a symbol. Read more about it in the Rails docs.
  • If you’re still curious about the n + 1 problem, check out another good example of it over here.
  • Big O Notation got you down? Check out this awesome resource, which has graphs to go along with each code snippet example of different types of notation.