Words and Code

One writer’s journey from words to code.

Methods to Remember Things by: Ruby Memoization

#technicaltuesdays, computer science, rails, ruby


A couple of months ago, I wrote a blog post on some basic Ruby keywords including begin, end, rescue, and ensure. A few days after publishing said post, another Rubyist friend of mine sent me a message telling me to read about memoization. “You basically describe memoization in your post without ever explicitly explaining it,” she had said. At the time, I had added it to my ever-growing list of “things to learn more about”, but promptly forgot to make the time to learn about the concept.

Cut to last week, when I was trying to write a controller action that had to do something a bit more complex than simply render a JSON-formatted response of a given resource. So, I started off by implementing a begin end block to execute a bunch of code that needed to run on it’s own. I remembered writing about how to use these two keywords, so I pulled up my post and was suddenly reminded of…memoization. It turned out that I actually needed to use memoization in this controller action, and had already been using it elsewhere in the very same project! But, I still didn’t understand what it was, or in what way I had been using it so far.

After putting it off for months, it was finally time to learn about this memoization business. For those of us (myself included!) who haven’t quite gotten the memo on memoization, here’s the brief lowdown: the term dates back to the year 1968, when it was coined by Donald Michie, a British artificial intelligence researcher who worked alongside Alan Turing at the Code and Cypher School at Bletchley Park during WWII. The basic idea of a memoization function is that it can “remember” the result that corresponds to a set of given inputs. In other words, it can store the results of an expensive function call by “saving” the result such that when you call the same function again with the same parameters, you don’t need to rerun the function. In the context of computer science specifically, this is a kind of optimization technique that is used to make programs more efficient and much faster. There are a few different ways that memoization pops up in Ruby and Rails, which is exactly what we’ll need to learn about next!

Multiple Memoization


There are effectively two types of memoization when it comes to Rails applications: simple, single-line memoization, and the more complex form found in multiple-line memoization. They still are the same concept and we can implement them using the same conditional operator; the fundamental difference between the two types hinges upon how much logic needs to be run for the object that we’re trying to “remember”, or in other words, memoize.

Let’s start with a simple memoization example. In our bookstore application, we have a piece of functionality that allows users to write reviews for books that they have purchased. Currently however, our ReviewsController doesn’t account for that functionality. It only has a simple index action that is currently rendering all the Reviews that have been published:

1
2
3
4
5
class ReviewsController < ApplicationController
  def index
      render json: Review.published
  end
end

The published method that we’re chaining on here is just a simple scope that we learned about last week, and added on to our Review model:

1
2
3
4
5
6
class Review < ActiveRecord::Base
  scope :published, -> {
      where('published_at IS NOT NULL AND 
     published_at <= ?', Time.zone.now)
  }
end

We can implement some simple memoization by abstracting out what’s currently happening in the index action of our ReviewsController. Since memoization roughly translates to the concept of “remembering” the return value of a function without having to call it again, we could use Ruby’s instance variable to store the return value of an expensive function call. In our case, the function that we’re calling and saving to our instance variable is the published scope on our Review class:

1
2
3
4
5
6
7
8
9
10
class ReviewsController < ApplicationController
  def index
      render json: reviews
  end

  private
  def reviews
      @reviews ||= Review.published
  end
end

Now our index action is calling the private reviews method, which is “remembering”, or essentially assigning and saving the return value of Review.published to the instance variable @reviews. Right now it doesn’t look like much, but this could help keep our code clean as we continue to build out this controller.

The tricky thing to keep in mind with controllers is that they are nothing more than Ruby classes. This is important to remember because this instance variable will exist for the lifespan of a single request; if we make a network call (probably a GET request) when we query the index endpoint of our ReviewsController, the @reviews instance variable will be assigned and exist for the duration of that request. Once that request has completed, that instance of the controller is no longer needed and a new one would be created. Right now, we’re not doing very much in our existing codebase with this piece of functionality. But why might that be important? Let’s find out.

Imagine that the index action of our ReviewsController needs to be rewritten to account for a new piece of functionality. Instead of merely loading all of our published book reviews, we now want to be able to account for some query params. For example, if a user navigated to a route such as /the-sound-and-fury/reviews, they should be able to see all the published books reviews for that specific book, based on the book slug that is used in the URL. We immediately know that we need to change our reviews method. But it’s not going to be as simple as just chaining on another method; we have a bit more complicated logic going on here.

First, we’ll need to check whether there’s a book_slug parameter being passed in. If there is, we’ll need to query for the correct book reviews based on that query param. If there is no parameter being passed in, we’ll just want to return our published Reviews. To account for this new feature, our method may now look something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class ReviewsController < ApplicationController
      def index
      render json: reviews
    end

  private
      def reviews
      @reviews ||= begin

      _reviews = Review.published

      if params[:book_slug].present?
          _reviews = _reviews.where(book_slug: params[:book_slug])
      end

      _reviews
    end
  end
end

Here, we’re implementing the multiple-line form of memoization, which calls for the use of our favorite Ruby keywords, begin and end. We’re first setting a local _reviews variable to all the published reviews; if there’s a book_slug query parameter being passed in for this GET request, we’re modifying this variable to select only the published reviews that have a book_slug attribute that matches the query param that was passed in. Ultimately, we’re returning our _reviews variable, which will either be just an array of all the published reviews, or the published reviews that match our query parameter.

We don’t necessarily have to use a variable name prepended with an _ underscore, but I’ve seen other developers do this in their code and I’ve come to realize that this can be one way of denoting to other developers that this variable is being modified but not explicitly used. It can be a way of indicating that this variable is only necessary to assign the instance variable @reviews, and is never called or referenced outside of our begin end code block. We should also note that our index action hasn’t changed one bit. All of our modified logic still lives in the same method, and is still accessible from our @reviews instance variable, from any action within this controller.

Sometimes, the begin end block for multiple-line memoization is simply used because all of the code won’t fit on a single line. The begin end block ensures that the code will be executed together in a single chunk, which effectively encapuslates the block of code in the same way, as though it were written on a single line, but makes it look much prettier and far easier to read.

Crazy For Conditionals


In order to really understand what’s going on with memoization, it’s important to identify the behind the scenes action of Ruby’s “or equals” (sometimes referred to as the “double pipe”) operator: ||=.

When I first learned about this operator, I initially thought that it functioned by telling the Ruby interpreter something equivalent to, Hey, if a value for this variable already exists, please return that. Otherwise, if this variable doesn’t have a value yet, assign it to whatever block of code comes next. But apparently, that’s not exactly what’s going on here. In actuality, this operator is far more nuanced that most people may initially think it to be. Peter Cooper’s Ruby Inside blog post does a fantastic job of unpacking all the different edge cases of the or equals operator, including the various scenarios when it can be a bit problematic. I really like the way that he summarizes the misconception behind the “or equals” operator quite simply as follows:

A common misconception is that a ||= b is equivalent to a = a || b, but it behaves like a || a = b. In a = a || b, a is set to something by the statement on every run, whereas with a || a = b, a is only set if a is logically false (i.e. if it’s nil or false) because || is ‘short circuiting’. That is, if the left hand side of the || comparison is true, there’s no need to check the right hand side.

In other words, what he’s saying here is that when we write something like this:

1
@review ||= Review.find(params[:id])

what we’re actually doing is saying something along these lines to the Ruby interpreter: If @reviews currently evaluates to false, then set it to the the return value of Review.find(params[:id]). But, if @reviews is not a falsey value, don’t assign or set the variable to anything. Just stop running and exit out of the method.

It’s also worth bringing up the fact that both nil and false are “falsey” values, which means that if @review was nil and empty when this line runs, the method would not short circuit, and would continue to execute after the ||= operator, thereby assigning the @review instance variable during method execution. This is significant if you are integrating with an external API where you can’t be sure if your instance variable will be falsey or not:

1
2
3
4
5
6
7
8
9
class Review < ActiveRecord::Base
  def goodreads_reviews
      @goodreads_reviews ||= begin
          # Some logic here that uses a third-party
          # API like Goodreads and returns an array 
          # of reviews, if any happen to exist.
      end
  end
end

In this case, if our API endpoint that we’re querying happens to return nil for a set of reviews or for a particular book that may have no reviews, every single place that we’re calling this method will be running the logic inside of the begin end block. This pretty much makes our idea of “memoizing” the result of this expensive query a moot point, because we’re not “remembering” the return value, but instead just running that line of code again and again. We could fix this by writing a less beautiful but more flexible method like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
class Review < ActiveRecord::Base
  def goodreads_reviews
      unless defined? @goodreads_reviews

      @goodreads_reviews ||= begin
          # This will only execute now if
          # @goodreads_reviews is undefined
          # as nil, and not otherwise.
      end

      @goodreads_reviews
  end
end

This isn’t as big of an issue if we’re using an ActiveRecord method or a scope, which would return an empty array [], and not nil. But it’s important to keep the memoization of falsey values in mind, since we could very easily be making a lot more queries to our database than we might realize.

Finally, there’s another tricky situation when it comes to memoizing a method that accepts an argument. Justin Weiss’ blog post explains how to get around this by using the Ruby Hash initializer method (Hash.new), which ensures that the only time a block will be executed is if we try to access a key that doesn’t yet have a value assigned to it in the context of our hash. This can be a little hard to understand at first, but is pretty useful for more complex forms of method memoization.

Of memos long gone

Memoization has clearly been around for a long time in the computer science world, but interestingly, it’s had a bit of a rocky history in Railsland. It turns out that there actually used to be an entire ActiveSupport::Memoizable module back in an older version of Rails 3! Apparently, there was a lot of controversy surrounding that particular module, and it was deprecated and, eventually, completely removed in 2011.

At the time of deprecation, the Rails core team encouraged everyone to use the ||= “or equals” operator format of method memoization, and what’s really cool about this is that you can actually see examples of how the core team members changed the code in the exact commit where the Memoizable module was removed. Here’s one example in the Rails source code of method memoization in the DateTimeSelector class:

1
2
3
4
5
6
7
class DateTimeSelector
  @month_names ||= begin
      month_names = @options[:use_month_names] || translated_month_names
      month_names.unshift(nil) if month_names.size < 13
      month_names
  end
end

Pretty cool, right!?

Of course, some Rubyists were not a big fan of this commit and module deprecation. In fact, some developers have fought to keep the module alive in the form of gems! The two that are the most popular are the memoizable gem as well as the memoist gem. Both of them ultimately allow us to write a memoizable method like this:

1
2
3
4
5
6
7
8
9
10
11
12
require 'memoist'
class Order
  extend Memoist

      def card_last_4
      # Logic to decrypt and
      # return last 4 digits
      # of credit card on the
      # order, properly formatted.
    end
    memoize :card_last_4
end

Effectively, this continues what the ActiveSupport::Memoizable module used to allow. In the method above, calling card_last_4 on an instance of an Order class would only be calculated once, and would be memoized from that point on.

I haven’t used either of these gems because I personally would prefer to follow Rails conventions. But, I plan on playing around with them a bit in order to try and understand why it was deprecated, and why it implemented in the first place. Of course, we could also read the entire Github discussion that took place at the time of deprecation, but that’s a whole lot of comments to read.

No matter what form of method memoization we choose to use, there are certain times when it makes a lot of sense and is clearly the right tool for the job. Anytime we find ourselves making repeated database queries, or time-consuming expensive calculations, or repeated calculations that are never really going to chance for an instance of a class or a controller, memoization using Ruby’s ||= operator is probably our best bet. And now that we know the theory and history behind Ruby method memoization, we’ll never forget!

I hope.


tl;dr?

  • Memoization is a long-standing computer science concept that is basically the idea of “remembering” the value of a function to avoid running expensive method calls and calculations multiple times in our code.
  • The crux of Ruby’s memoization techniques relies upon using the conditional “or equals” operator ||=, which assigns a value and executes the following line of code only if the variable being assigned is not falsey (i.e. not nil or false).
  • There are some great blog posts on the basics of Ruby memoization. This two-part series (part one and part two) is a pretty good place to start.