Words and Code

One writer’s journey from words to code.

Refactoring to Reveal Rails Group_by

There are a lot of opportunities in programming to feel pretty silly about yourself. I’m sure that the more coding experience you have, the less often these moments actually occur. But early on in your career, they seem to happen quite often – or they do to me, at least.

I had one of those moments a couple weeks ago, when I wrote a super long method and then asked a senior developer to take a look at it. While I was writing it, something seemed inherently wrong. I was sure there was a better way to do it, but I guess my Googling skills aren’t quite superb yet, because I couldn’t find quite the right answer anywhere. It was then that I decided to ask someone who would know exactly which tool to reach for.

What happened next was pretty awesome, albeit slightly depressing. I watched my code be refactored from ten lines down to a single line. It blew my mind – and not just because I didn’t know that this method even existed, but because I wanted to know how it worked! So, I did some digging and learned a bit about the method that I wrote which, as it turns out, already existed: the Rails group_by method on Enumerables.

Data Is For Manipulating

I started off writing my super long method because I wanted to structure my data in a very specific way. In fact, we’ll probably want to structure the data in our Bookstore eCommerce app in a very similar way, too, so let’s use that as our working example.

For our admin panel, we want a list of Author objects, categorized by genre. Because our collection of Books is going to grow extensively, it would be helpful for an admin to know which authors are included in a genre or time period. Eventually, this could be used by admins to add new authors by a genre, to filter or sort by a genre, or to calculate an author count per genre, and figure out which authors to add to our collection of books.

Right now, our collection of Author objects isn’t very big, but has just enough information for us to start implementing this functionality:

1
2
3
4
5
> Author.all
=> [{#<Author:0x192ajk21a6d0b0 last_name: "Shakespeare", genre: "Renaissance">}, 
{#<Author:0w917qwl38f6s8v6 last_name: "Homer", genre: "Classics">}, 
{#<Author:0x390akd23a5d9m4 last_name: "Faulkner", genre: "Southern Gothic">}, 
{#<Author:1r103aur58b7c4r2 last_name: "Marlowe", genre: "Renaissance">}]

Even though our data is easy to read now, we can be sure that it isn’t going to stay that way. But we know that if we structure each of our objects correctly, we could have something simple, like this, in our view:

1
2
3
4
5
div
  - Author.sort_by_genre.each do |genre, author|
    h2 = genre
    - author.each do |a|
      p = a.last_name

I’m a big fan of slim, which is what I’ve used above, but this view would still be pretty minimal when using another templating language such as erb.

Given that this is the view we want to render, we can use this information to structure our data. I’m thinking a hash is the tool for the job, with each key being a genre name, and the value being an array of Author objects that we can iterate through for each specific genre.

It would be nice if we could call something like Author.sort_by_genre and have it return a structure like this:

1
2
3
=> {"Renaissance": [Shakespeare, Marlowe],
"Southern Gothic": [Faulkner],
"Classics": [Homer]}

So now that we know what we want our data to look like, let’s write it the ugly way, just like I did!

The First Iteration

To start with, we know that we want to return a hash. So we can start by instantiating a hash, which will be our authors_by_genre. We also know that we’ll need all the Author objects in an array; since we’re specifically looking for an author’s last_name and genre, we can query for those directly. And we can return our empty hash, since that will eventually be filled up:

1
2
3
4
5
6
def sort_by_genre
  authors_by_genre = {}
  authors = Author.all.collect { |author| [author.last_name, author.genre] }

  authors_by_genre
end

Okay, so now we need to fill up our hash. We have our authors variable, which is set to the collection of all Author objects. We will need to iterate through all of them, and put that Author in the correct array for the right genre key. If the right genre key doesn’t exist, we’ll need to create a key for that author’s genre. We can accomplish this with another iteration. Now our method looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class Author
  class << self
    def sort_by_genre
      authors_by_genre = {}
      authors = Author.all.collect { |author| [author.last_name, author.genre] }

      authors.each do |genre, author|
        if authors_by_genre[genre]
          authors_by_genre[genre] << author
        else
          authors_by_genre[genre] = [author]
        end
      end

      authors_by_genre
      end
    end
  end
end

Okay…so hopefully, by this point, you should be thinking to yourself: This isn’t just ugly, it’s also super inefficient. And you’d be right. It’s pretty bad.


If you weren’t sure how bad it really is, think about all the queries we’re making! And how many times we’re iterating! As soon as I finished writing this and got it working, I knew it was definitely not the right approach. But the first step is to get it working. Now, we need to seriously consider how we can make it right and make it fast.

You know what what this means, right? Refactoring time.

Group All The Things

A quick way to refactor some of the messiness from our first iteration is by first changing how we initialize our hash object, and also modifying how we go about deciding whether to create a new key or add to a key that already exists.

Enter the each_with_object method! This is a pretty rad method I learned about while refactoring my first iteration of this sort_by_genre method. The each_with_object method requires a single argument: the object that you want to pass to it in each iteration. In our case, we’ll pass it a hash. And since the items we want to actually “categorize” are our Author objects, we’ll call each_with_object on our collection of Authors:

1
2
3
4
5
6
7
class Author
  class << self
    def sort_by_genre
      Author.all.each_with_object({}) {  }
    end
  end
end

Now, what about that block – what goes inside? Well, we can think about what we want to do with each of our Author objects that we’re iterating over. Inside of our hash, which we’re passing explicitly as an argument to each_with_object, we want to either find the correct key and put the correct Author into that array, or create a new key based on the current Author object’s genre. We can write that quite nicely by using the ||= or equals operator, which will assign a new object, or whatever is to the right of the operator, equal to the left side of the pipes, or whatever is to the left of the operator:

1
2
3
4
5
6
7
8
9
class Author
  class << self
    def sort_by_genre
      Author.all.each_with_object({}) {
      |author, hash| (hash[author.genre]
      ||= []) << author }
    end
  end
end

Much better, right? In this second iteration, we’re passing a hash directly to the each_with_object method, and basically telling it, Find the key in the hash I just passed you that is equal to this author’s genre. And if no such key exists, make one, set it equal to an empty array, and then put this current author into that array.

The order of our or equals operator is particularly important, because if it were switched, it would never run what is on the right side of the pipes. The ||= operator is exactly like the || operator in that it will run what is to its right only if what is to its left evaluates to false. This is what keeps our method from trying to create multiple keys again and again, and instead forces it to find an existing key first. The super cool thing about the ||= operator is that it is actually assigning a new key value to an empty array, which cuts out a lot of extra lines we had in our first iteration!

Okay, so this second iteration has been a vast improvement. But I think it’s time for some serious refactoring magic. Are you ready? Okay. This entire method can be rewritten into a single, simple line:

1
2
3
4
5
6
7
class Author
  class << self
    def sort_by_genre
      Author.all.group_by(&:genre)
    end
  end
end

Yup. I kid you not.


This is the magic of the Rails group_by method, which collects an enumerable into sets, grouping it by the result of a block. This method takes a proc using the ampersand shortcut as an argument (which we started using last week!). The group_by method is passed the symbol :genre, which is an attribute on each Author object, and corresponds to a genre column in the authors table. So, we are effectively grouping all of our Author objects by the result of calling .genre on each object. In other words, we’re grouping by the genre attribute since the attribute corresponds to an attr_accessor method in the class.

And now, if we call our sort_by_genre class method, we get the exact data structure we were hoping for:

1
2
3
4
5
> Author.sort_by_genre
=> {"Renaissance": [{#<Author:0x192ajk21a6d0b0 last_name: "Shakespeare", genre: "Renaissance">}, 
{#<Author:1r103aur58b7c4r2 last_name: "Marlowe", genre: "Renaissance">}], 
"Southern Gothic": [{#<Author:0x390akd23a5d9m4 last_name: "Faulkner", genre: "Southern Gothic">}], 
"Classics": [{#<Author:0w917qwl38f6s8v6 last_name: "Homer", genre: "Classics">}]}

Hooray! Or maybe not hooray. Maybe instead of hooray, you feel like I did when I realized that you could refactor all of this into one line:


Although it made me feel pretty silly, the actual process of writing the group_by method from scratch was a really great learning experience. I ended up using the examples above in a talk I gave on refactoring at Red Dot Ruby Conference in Singapore last week. And I actually recreated the exact same functionality when I had to write my own groupBy function in JavaScript for a React component I had to build! I guess that however deep of a rabbit hole refactoring can be, sometimes it’s exactly the right kind of deep dive you need to learn how something works under the hood. I still totally feel like that cat in the bathtub whenever I do it – but I’m kind of okay with that.

tl;dr?

  • The group_by method takes a block, which it uses to group a collection of objects. The each_with_object method takes an object as an argument, and a block which tells it how to sort the collection you call the method upon.
  • Read more on the group_by method in the Rails docs, which also has a great example!
  • Looking for another example of how to implement Rails’ group_by in a view? Check out this blog post.
  • Did you know that Ruby also has a similar group_by method? It’s great when you want to return a hash where the keys are evaluated by a block!