A couple of months ago, I wrote a blog post on some basic Ruby keywords including begin
, end
, rescue
, and ensure
. A few days after publishing said post, another Rubyist friend of mine sent me a message telling me to read about memoization. “You basically describe memoization in your post without ever explicitly explaining it,” she had said. At the time, I had added it to my ever-growing list of “things to learn more about”, but promptly forgot to make the time to learn about the concept.
Cut to last week, when I was trying to write a controller action that had to do something a bit more complex than simply render a JSON-formatted response of a given resource. So, I started off by implementing a begin
end
block to execute a bunch of code that needed to run on it’s own. I remembered writing about how to use these two keywords, so I pulled up my post and was suddenly reminded of…memoization. It turned out that I actually needed to use memoization in this controller action, and had already been using it elsewhere in the very same project! But, I still didn’t understand what it was, or in what way I had been using it so far.
After putting it off for months, it was finally time to learn about this memoization business. For those of us (myself included!) who haven’t quite gotten the memo on memoization, here’s the brief lowdown: the term dates back to the year 1968, when it was coined by Donald Michie, a British artificial intelligence researcher who worked alongside Alan Turing at the Code and Cypher School at Bletchley Park during WWII. The basic idea of a memoization function is that it can “remember” the result that corresponds to a set of given inputs. In other words, it can store the results of an expensive function call by “saving” the result such that when you call the same function again with the same parameters, you don’t need to rerun the function. In the context of computer science specifically, this is a kind of optimization technique that is used to make programs more efficient and much faster. There are a few different ways that memoization pops up in Ruby and Rails, which is exactly what we’ll need to learn about next!
Multiple Memoization
There are effectively two types of memoization when it comes to Rails applications: simple, single-line memoization, and the more complex form found in multiple-line memoization. They still are the same concept and we can implement them using the same conditional operator; the fundamental difference between the two types hinges upon how much logic needs to be run for the object that we’re trying to “remember”, or in other words, memoize.
Let’s start with a simple memoization example. In our bookstore application, we have a piece of functionality that allows users to write reviews for books that they have purchased. Currently however, our ReviewsController
doesn’t account for that functionality. It only has a simple index
action that is currently rendering all the Reviews
that have been published
:
1 2 3 4 5 |
|
The published
method that we’re chaining on here is just a simple scope that we learned about last week, and added on to our Review
model:
1 2 3 4 5 6 |
|
We can implement some simple memoization by abstracting out what’s currently happening in the index
action of our ReviewsController
. Since memoization roughly translates to the concept of “remembering” the return value of a function without having to call it again, we could use Ruby’s instance variable to store the return value of an expensive function call. In our case, the function that we’re calling and saving to our instance variable is the published
scope on our Review
class:
1 2 3 4 5 6 7 8 9 10 |
|
Now our index
action is calling the private reviews
method, which is “remembering”, or essentially assigning and saving the return value of Review.published
to the instance variable @reviews
. Right now it doesn’t look like much, but this could help keep our code clean as we continue to build out this controller.
The tricky thing to keep in mind with controllers is that they are nothing more than Ruby classes. This is important to remember because this instance variable will exist for the lifespan of a single request; if we make a network call (probably a GET
request) when we query the index
endpoint of our ReviewsController
, the @reviews
instance variable will be assigned and exist for the duration of that request. Once that request has completed, that instance of the controller is no longer needed and a new one would be created. Right now, we’re not doing very much in our existing codebase with this piece of functionality. But why might that be important? Let’s find out.
Imagine that the index
action of our ReviewsController
needs to be rewritten to account for a new piece of functionality. Instead of merely loading all of our published
book reviews, we now want to be able to account for some query params. For example, if a user navigated to a route such as /the-sound-and-fury/reviews
, they should be able to see all the published books reviews for that specific book, based on the book slug that is used in the URL. We immediately know that we need to change our reviews
method. But it’s not going to be as simple as just chaining on another method; we have a bit more complicated logic going on here.
First, we’ll need to check whether there’s a book_slug
parameter being passed in. If there is, we’ll need to query for the correct book reviews based on that query param. If there is no parameter being passed in, we’ll just want to return our published
Reviews. To account for this new feature, our method may now look something like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Here, we’re implementing the multiple-line form of memoization, which calls for the use of our favorite Ruby keywords, begin
and end
. We’re first setting a local _reviews
variable to all the published
reviews; if there’s a book_slug
query parameter being passed in for this GET
request, we’re modifying this variable to select only the published reviews that have a book_slug
attribute that matches the query param that was passed in. Ultimately, we’re returning our _reviews
variable, which will either be just an array of all the published reviews, or the published reviews that match our query parameter.
We don’t necessarily have to use a variable name prepended with an _
underscore, but I’ve seen other developers do this in their code and I’ve come to realize that this can be one way of denoting to other developers that this variable is being modified but not explicitly used. It can be a way of indicating that this variable is only necessary to assign the instance variable @reviews
, and is never called or referenced outside of our begin
end
code block. We should also note that our index
action hasn’t changed one bit. All of our modified logic still lives in the same method, and is still accessible from our @reviews
instance variable, from any action within this controller.
Sometimes, the begin
end
block for multiple-line memoization is simply used because all of the code won’t fit on a single line. The begin
end
block ensures that the code will be executed together in a single chunk, which effectively encapuslates the block of code in the same way, as though it were written on a single line, but makes it look much prettier and far easier to read.
Crazy For Conditionals
In order to really understand what’s going on with memoization, it’s important to identify the behind the scenes action of Ruby’s “or equals” (sometimes referred to as the “double pipe”) operator: ||=
.
When I first learned about this operator, I initially thought that it functioned by telling the Ruby interpreter something equivalent to, Hey, if a value for this variable already exists, please return that. Otherwise, if this variable doesn’t have a value yet, assign it to whatever block of code comes next. But apparently, that’s not exactly what’s going on here. In actuality, this operator is far more nuanced that most people may initially think it to be. Peter Cooper’s Ruby Inside blog post does a fantastic job of unpacking all the different edge cases of the or equals operator, including the various scenarios when it can be a bit problematic. I really like the way that he summarizes the misconception behind the “or equals” operator quite simply as follows:
A common misconception is that
a ||= b
is equivalent toa = a || b
, but it behaves likea || a = b
. Ina = a || b
,a
is set to something by the statement on every run, whereas witha || a = b
,a
is only set ifa
is logically false (i.e. if it’snil
orfalse
) because||
is ‘short circuiting’. That is, if the left hand side of the||
comparison is true, there’s no need to check the right hand side.
In other words, what he’s saying here is that when we write something like this:
1
|
|
what we’re actually doing is saying something along these lines to the Ruby interpreter: If @reviews
currently evaluates to false
, then set it to the the return value of Review.find(params[:id])
. But, if @reviews
is not a falsey value, don’t assign or set the variable to anything. Just stop running and exit out of the method.
It’s also worth bringing up the fact that both nil
and false
are “falsey” values, which means that if @review
was nil
and empty when this line runs, the method would not short circuit, and would continue to execute after the ||=
operator, thereby assigning the @review
instance variable during method execution. This is significant if you are integrating with an external API where you can’t be sure if your instance variable will be falsey or not:
1 2 3 4 5 6 7 8 9 |
|
In this case, if our API endpoint that we’re querying happens to return nil
for a set of reviews or for a particular book that may have no reviews, every single place that we’re calling this method will be running the logic inside of the begin
end
block. This pretty much makes our idea of “memoizing” the result of this expensive query a moot point, because we’re not “remembering” the return value, but instead just running that line of code again and again. We could fix this by writing a less beautiful but more flexible method like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
|
This isn’t as big of an issue if we’re using an ActiveRecord
method or a scope, which would return an empty array []
, and not nil
. But it’s important to keep the memoization of falsey values in mind, since we could very easily be making a lot more queries to our database than we might realize.
Finally, there’s another tricky situation when it comes to memoizing a method that accepts an argument. Justin Weiss’ blog post explains how to get around this by using the Ruby Hash
initializer method (Hash.new
), which ensures that the only time a block will be executed is if we try to access a key that doesn’t yet have a value assigned to it in the context of our hash. This can be a little hard to understand at first, but is pretty useful for more complex forms of method memoization.
Of memos long gone
Memoization has clearly been around for a long time in the computer science world, but interestingly, it’s had a bit of a rocky history in Railsland. It turns out that there actually used to be an entire ActiveSupport::Memoizable
module back in an older version of Rails 3! Apparently, there was a lot of controversy surrounding that particular module, and it was deprecated and, eventually, completely removed in 2011.
At the time of deprecation, the Rails core team encouraged everyone to use the ||=
“or equals” operator format of method memoization, and what’s really cool about this is that you can actually see examples of how the core team members changed the code in the exact commit where the Memoizable module was removed. Here’s one example in the Rails source code of method memoization in the DateTimeSelector
class:
1 2 3 4 5 6 7 |
|
Pretty cool, right!?
Of course, some Rubyists were not a big fan of this commit and module deprecation. In fact, some developers have fought to keep the module alive in the form of gems! The two that are the most popular are the memoizable
gem as well as the memoist
gem. Both of them ultimately allow us to write a memoizable method like this:
1 2 3 4 5 6 7 8 9 10 11 12 |
|
Effectively, this continues what the ActiveSupport::Memoizable
module used to allow. In the method above, calling card_last_4
on an instance of an Order
class would only be calculated once, and would be memoized from that point on.
I haven’t used either of these gems because I personally would prefer to follow Rails conventions. But, I plan on playing around with them a bit in order to try and understand why it was deprecated, and why it implemented in the first place. Of course, we could also read the entire Github discussion that took place at the time of deprecation, but that’s a whole lot of comments to read.
No matter what form of method memoization we choose to use, there are certain times when it makes a lot of sense and is clearly the right tool for the job. Anytime we find ourselves making repeated database queries, or time-consuming expensive calculations, or repeated calculations that are never really going to chance for an instance of a class or a controller, memoization using Ruby’s ||=
operator is probably our best bet. And now that we know the theory and history behind Ruby method memoization, we’ll never forget!
I hope.
tl;dr?
- Memoization is a long-standing computer science concept that is basically the idea of “remembering” the value of a function to avoid running expensive method calls and calculations multiple times in our code.
- The crux of Ruby’s memoization techniques relies upon using the conditional “or equals” operator
||=
, which assigns a value and executes the following line of code only if the variable being assigned is not falsey (i.e. notnil
orfalse
). - There are some great blog posts on the basics of Ruby memoization. This two-part series (part one and part two) is a pretty good place to start.