Hunting Down the Scoop on ActiveRecord Scopes

Over the past forty or so Tuesdays — has it really been that many?! — I’ve written on a spread of topics. There’s a slight problem with this: sometimes I forget what I have and haven’t written about. Here’s a case in point for you: last week, I wrote about finder objects, and casually tossed in some scopes into my models. It turns out, I’ve never actually written about how scopes work, or what they really do!

I know, I know, that’s pretty terrible of me. I actually learned about scopes awhile ago, and now I use them fairly often in my applications. However, I got so used to writing them, that I never really thought that much about how they work behind the scenes. In fact, when I sat down to write this post, I had to go on a hunt into the Rails source code and Ruby blogosphere to figure out what was going on under the hood every single time I implemented a scope in my code.

The main reason that I like to use ActiveRecord scopes is because they allow us to specify commonly-used queries, but encapsulate these queries into methods, which we can then call on our models or association objects. However, my hunt lead me to find out that scopes have been around for awhile in Railsland, so they’re not exactly that new. But, what’s interesting about them is how their implementation has changed and grown with different releases of Rails. There’s also a lot of debate over how and when scopes are different from their counterparts, or simpler class methods. But what makes a scope exactly? Well, it’s finally time for us to hunt down the answer to that question.

The simplest of scopes

While developing applications, we often run into a situation where we want to select a particular group of objects (read: rows from our database) based on the characteristics that they share. The basic implementation of scopes can be summed up as this simple idea: being able to narrow down the objects that we want into a specific subset, based on some given parameters. We can tell ActiveRecord (an Object Relational Mapper, or ORM) to select a certain group of objects by implementing a scope that is specific to a model. Luckily, scopes aren’t too difficult to define, and mostly adhere to a simple structure.

In our bookstore app for example, we have a Review object, that we allow our users to write about the Books that they purchase through our store. For now, our Reviews belong to a User, and they have some basic attributes which map to columns in our database, including a published_at datetime attribute, that we set when our User clicks the submit button, which saves their “drafted” review and turns it into a “published” review.

However, one side effect of having this attribute (and effectively, two different states or “types” of reviews) is that we now have no obvious form of selecting only our “published” reviews — that is to say, reviews that have a published_at date attribute set on them. How can we fix this? Well, we can write a class method that, when invoked, will run a query on our ActiveRecord object and only return the reviews that have this attribute. If we did that, our model might look something like this:

class Review < ActiveRecord::Base
  belongs_to :user

  def self.published
      where('published_at IS NOT NULL')
  end
end

Okay, that’s a good start. Remember that the implicit self in the body of this class method is our Review class, so we’re basically running Review.where('published_at IS NOT NULL'). But now we run into another problem: this query isn’t all that specific, is it? What makes a published review, exactly? Well, it’s not just the fact that the published_at date should be set; we also need to account for the fact that some reviews could be set to be published in the future, at a later date. What we really want to select are our reviews that have a published_at date that has already happened; in other words, a date which occurred in the past. We can modify our class method to account for this:

class Review < ActiveRecord::Base
  belongs_to :user

  def self.published
      where('published_at IS NOT NULL AND 
     published_at <= ?', Time.zone.now)
  end
end

If we try out this class method, we can see the exact SQL that gets executed:

♥ rails c
Loading development environment (Rails 4.1.4)
irb(main):001:0> Review.published
# SELECT "reviews".* FROM "reviews" WHERE "reviews".
"published_at" IS NOT NULL AND "reviews"."published_
at" <= 2015-10-27 08:07:36 -0400

However, instead of writing this functionality into the body of a class method, we could accomplish the exact same thing by using a scope:

class Review < ActiveRecord::Base
  belongs_to :user

  scope :published, -> {
      where('published_at IS NOT NULL AND 
     published_at <= ?', Time.zone.now)
  }
end

which allows us to invoke a method in the console that pretty much looks like the method we had before:

irb(main):002:0> Review.published
  Review Load (2.6ms)  SELECT "reviews".* FROM
  "reviews" WHERE "reviews".
"published_at" IS NOT NULL AND "reviews"."published_
at" <= 2015-10-27 08:07:36 -0400
=> #<ActiveRecord::Relation []>

Okay, wait — what’s going on here?! How did that even happen? Well, let’s break it down:

First, we’re using something called the scope method. This class method is defined within the ActiveRecord::Scoping::Named module.
Second, the scope class method requires two important arguments: a name for the scope, and a callable object that includes a query criteria. That last part about passing a callable object is pretty important, because only procs and lambdas are callable objects. In fact, that -> {} syntax that we’re using is just another way of writing a lambda in Ruby.
Third, and most interestingly, the return value of our scope was an ActiveRecord::Relation object. This is significant because ActiveRecord::Relation objects are not eagerly-loaded — they’re lazily-loaded. Lazy-loading basically means that we’re never going to query to the database until we actually need to. What makes this really awesome is that lazy-loading allows us to call more methods (read: scopes galore!) on our returned ActiveRecord::Relation object.

It looks like perhaps there’s some funky stuff going on here. But, all of these things still don’t really answer our burning question: why use a scope when we could just write a class method?!

Class methods by any other name

What’s in a scope? A class method by any other name would smell just as sweet! Oops, I got carried away there. Enough poetry, let’s talk prose. Or scopes, rather, and why we might want to use them.

We want to change the implementation of our published class method such that it accepts an argument that makes our query more flexible. Let’s say that we want to be able to filter our Reviews by a specific publication date. We might now have a class method that looks like this:

class Review < ActiveRecord::Base
  belongs_to :user

  def self.published(on)
      where('published_at IS NOT NULL AND 
     published_at <= ?', on)
  end
end

The on parameter would ideally be a Date or a Datetime object that would dynamically change the rows that we’ll query for in our database. This will behave exactly like we want it to, until…it breaks. How can we break this? Well, let’s say that we now want to order our published reviews by their position attribute, which for the time being, is just an integer. No problem, we can do that, right?

irb(main):003:0> Review.published(Time.zone.now)
  .order(position: asc)
  Review Load (0.2ms)  SELECT "review".* FROM
  "review"  WHERE (published_at IS NOT NULL AND
  published_at <= 2015-11-02 08:07:36 -0400)
  ORDER BY "review"."position" ASC
=> #<ActiveRecord::Relation [#<Review id: 1, published_at: 
"2015-08-02 00:04:22", position: 5>, #<Review id: 2, 
published_at: nil, position: 10, published_at:
"2015-10-02 00:02:00">]>

Sure, no problem! This returns exactly what we’d expect. But what if we’re relying on this method elsewhere and somehow don’t pass in a parameter to our published method. What happens then?

irb(main):004:0> Review.published(nil).order(position: asc)
=> NoMethodError: undefined method `order' for nil:NilClass

BOOM! Everything broke. Oops. What happened here? We tried to call the order method on a falsy object (aka nil). Obviously Ruby is unhappy, because it looks like Review.published(nil) returns nil, which doesn’t respond to a method called order!

Now, let’s go fast forward to our new scope implementation in the Review class:

class Review < ActiveRecord::Base
  belongs_to :user

  scope :published, -> (on) {
      where('published_at IS NOT NULL AND 
     published_at <= ?', on)
  }
end

We’ve changed our callable object to accept a parameter, which is how we’re going to determine our published_at date. We can be pretty certain that this will execute the same query if we pass an actual date to this scope. But what if we pass nil again?

irb(main):005:0> Review.published(nil)
  .order(position: asc)
   Review Load (0.2ms)  SELECT "review".*
   FROM "review"  WHERE (published_at IS NOT
   NULL AND published_at <= 2015-11-02 08:07:36 -0400)
   ORDER BY "review"."position" ASC
=> #<ActiveRecord::Relation [#<Review id: 1, published_at: 
nil, position: 1, created_at: "2015-11-02 00:45:22",
updated_at: "2015-11-02 00:45:22">, #<Review id: 2, 
published_at: nil, position: 2, created_at:
"2015-11-02 00:46:22", updated_at: "2015-11-02 00:46:22">]>

Well, would you look at that! It didn’t break! It ran our expected query, but because scopes return ActiveRecord::Relation objects, it didn’t call order on nil, it just kept chaining on to our query. The first part of our query (responsible for finding any reviews that were published on a date) didn’t return anything, but the second part of our query (responsible for just ordering whatever got returned by our first query) did work! How, exactly? Well, it just so happens that calling a method on a blank ActiveRecord::Relation object returns that same relation. An important thing to note: if we had a query that was scoping down our reviews to ones that were published on a date and ordering those objects by their position, we would have gotten an empty relation:

SELECT "review".* FROM "review"  WHERE
(published_at IS NOT NULL AND published_at
<= 2015-11-02 08:07:36 -0400 AND
(ORDER BY "review"."position" ASC))

The above query narrows down our scope quite a bit, which we could do if we wanted to specify that to SQL. But in our case, our ORDER BY clause isn’t grouped inside of the AND, but instead exists outside of it, which is why we’re not getting an empty relation returned to us.

While we’re on the topic of relations, it’s also important to note that the method we have right now does not return an object to us! Relations are not objects! We’d need to explicitly query for a record if we wanted to return it:

irb(main):006:0> Review.published(nil)
  .order(position: asc).first
=> #<Review id: 1, published_at: nil, position: 1, 
created_at: "2015-11-02 00:45:22",
updated_at: "2015-11-02 00:45:22">

Hopefully we should now be able to easily see that the order method that we’re chaining on right there at the end could really be abstracted into its own scope! Let’s fix that, shall we?

class Review < ActiveRecord::Base
  belongs_to :user

  scope :published, -> (on) {
      where('published_at IS NOT NULL AND 
     published_at <= ?', on)
  }

  scope :ordered, -> { order(position: :asc) }
end

Much better. Now we can just chain on our order scope to our published scope without ever having to worry that our scopes will break. But wait, there’s even more we can do with scopes!

Special scope tricks

Because scopes accept lambdas and procs, we can pass in different arguments. We did that before when we passed in a datetime parameter. But this kind of flexibility can be especially powerful, because we can do things like pass in limits:

class Review < ActiveRecord::Base
  scope :published, -> (limit: 20) {
      where('published_at IS NOT NULL AND 
     published_at <= ?', Time.zone.now)
  }
end

This will run our same SQL query, but will add LIMIT 10 to the end of it. We can customize this scope further, or we can add more if we need to. We also might want to just perpetually apply a scope to all queries on a specific model. When we run into this situation, we can use the default_scope method.

class Review < ActiveRecord::Base
  default_scope -> { order(published_at: :desc) }
end

This will automatically append all of our SQL queries on this model with ORDER BY "review"."position" DESC. What’s really nice about having a default scope is that we don’t need to write and perpetually call a method named something like by_published_date on this model; it will be applied and invoked by default on all instances of this class.

According to the documentation, if we want to get super fancy with our default scope and have so much logic that it’s bursting from our callable object’s so-called seams, we can also define it in an alternate way as a class method:

class Review < ActiveRecord::Base
  def self.default_scope
    # Get fancy in here, but just make sure 
    # to return an ActiveRecord::Relation.
    # Otherwise, any scopes we chain onto 
    # this will automatically break!
  end
end

We’re also not limited to just using the where method! We can use plenty of other ActiveRecord::Relation methods, such as joins or includes, which will eager load other relations when we want to. Here’s a handy scope we could add to our Shipment model that we built out last week:

class Shipment < ActiveRecord::Base
  default_scope -> { includes(:order, :line_items) }
end

This is pretty cool because we’re using our default_scope method to automatically eager-load our associated order and line_items on our shipment without having to make two additional queries just to load them! As is the case with includes, it might not always be a good idea to do this, since we could be loading more records than we want, or could get stuck with a n+1 situation on our hands. But if we know what we’re doing and are sure that this scope is necessary, it can be pretty powerful.

We can also merge two scopes together, which effectively allows us to mix and match different WHERE conditions and group them together in SQL with an AND:

class Shipment < ActiveRecord::Base
  scope :shipped -> { where(state: 'shipped') }
  scope :damaged -> { where(condition: 'damaged') }
end

which we can then merge into a single SQL query by chaining our scopes together:

irb(main):007:0> Shipment.shipped.received
=> SELECT "shipments".* FROM "shipments" WHERE
"shipments"."state" = 'shipped' AND "shipments".
"condition" = 'damaged'

We’ll notice that in this situation, our WHERE clauses are grouped together with an AND, which can help us when it comes to writing super specific queries.

tl;dr?

ActiveRecord scopes give us a lot of flexibility, even though they are effectively defining a class method on a model. The fundamental difference between them however, is that scopes should always return an ActiveRecord::Relation object, which makes them forever chainable!
How does the scope method actually work? I’m not sure that I understand all of it, but perhaps you will! Check it out in the Rails source code!
There are a few great primers on writing effective scopes, like this one, and this other one.

Words and Code

One writer’s journey from words to code.

Hunting Down the Scoop on ActiveRecord Scopes

The simplest of scopes

Class methods by any other name

Special scope tricks

tl;dr?