There are a lot of opportunities in programming to feel pretty silly about yourself. I’m sure that the more coding experience you have, the less often these moments actually occur. But early on in your career, they seem to happen quite often – or they do to me, at least.
I had one of those moments a couple weeks ago, when I wrote a super long method and then asked a senior developer to take a look at it. While I was writing it, something seemed inherently wrong. I was sure there was a better way to do it, but I guess my Googling skills aren’t quite superb yet, because I couldn’t find quite the right answer anywhere. It was then that I decided to ask someone who would know exactly which tool to reach for.
What happened next was pretty awesome, albeit slightly depressing. I watched my code be refactored from ten lines down to a single line. It blew my mind – and not just because I didn’t know that this method even existed, but because I wanted to know how it worked! So, I did some digging and learned a bit about the method that I wrote which, as it turns out, already existed: the Rails group_by
method on Enumerables.
Data Is For Manipulating
I started off writing my super long method because I wanted to structure my data in a very specific way. In fact, we’ll probably want to structure the data in our Bookstore eCommerce app in a very similar way, too, so let’s use that as our working example.
For our admin panel, we want a list of Author
objects, categorized by genre
. Because our collection of Books
is going to grow extensively, it would be helpful for an admin to know which authors are included in a genre
or time period. Eventually, this could be used by admins to add new authors by a genre, to filter or sort by a genre, or to calculate an author count per genre, and figure out which authors to add to our collection of books.
Right now, our collection of Author
objects isn’t very big, but has just enough information for us to start implementing this functionality:
1 2 3 4 5 |
|
Even though our data is easy to read now, we can be sure that it isn’t going to stay that way. But we know that if we structure each of our objects correctly, we could have something simple, like this, in our view:
1 2 3 4 5 |
|
I’m a big fan of slim, which is what I’ve used above, but this view would still be pretty minimal when using another templating language such as erb.
Given that this is the view we want to render, we can use this information to structure our data. I’m thinking a hash is the tool for the job, with each key being a genre
name, and the value being an array of Author
objects that we can iterate through for each specific genre
.
It would be nice if we could call something like Author.sort_by_genre
and have it return a structure like this:
1 2 3 |
|
So now that we know what we want our data to look like, let’s write it the ugly way, just like I did!
The First Iteration
To start with, we know that we want to return a hash. So we can start by instantiating a hash, which will be our authors_by_genre
. We also know that we’ll need all the Author
objects in an array; since we’re specifically looking for an author’s last_name
and genre
, we can query for those directly. And we can return our empty hash, since that will eventually be filled up:
1 2 3 4 5 6 |
|
Okay, so now we need to fill up our hash. We have our authors
variable, which is set to the collection of all Author
objects. We will need to iterate through all of them, and put that Author
in the correct array for the right genre
key. If the right genre
key doesn’t exist, we’ll need to create a key for that author’s genre
. We can accomplish this with another iteration. Now our method looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
Okay…so hopefully, by this point, you should be thinking to yourself: This isn’t just ugly, it’s also super inefficient. And you’d be right. It’s pretty bad.
If you weren’t sure how bad it really is, think about all the queries we’re making! And how many times we’re iterating! As soon as I finished writing this and got it working, I knew it was definitely not the right approach. But the first step is to get it working. Now, we need to seriously consider how we can make it right and make it fast.
You know what what this means, right? Refactoring time.
Group All The Things
A quick way to refactor some of the messiness from our first iteration is by first changing how we initialize our hash object, and also modifying how we go about deciding whether to create a new key or add to a key that already exists.
Enter the each_with_object
method! This is a pretty rad method I learned about while refactoring my first iteration of this sort_by_genre
method. The each_with_object
method requires a single argument: the object that you want to pass to it in each iteration. In our case, we’ll pass it a hash. And since the items we want to actually “categorize” are our Author
objects, we’ll call each_with_object
on our collection of Authors
:
1 2 3 4 5 6 7 |
|
Now, what about that block – what goes inside? Well, we can think about what we want to do with each of our Author
objects that we’re iterating over. Inside of our hash, which we’re passing explicitly as an argument to each_with_object
, we want to either find the correct key and put the correct Author
into that array, or create a new key based on the current Author
object’s genre. We can write that quite nicely by using the ||=
or equals operator, which will assign a new object, or whatever is to the right of the operator, equal to the left side of the pipes, or whatever is to the left of the operator:
1 2 3 4 5 6 7 8 9 |
|
Much better, right? In this second iteration, we’re passing a hash directly to the each_with_object
method, and basically telling it, Find the key in the hash I just passed you that is equal to this author’s genre. And if no such key exists, make one, set it equal to an empty array, and then put this current author into that array.
The order of our or equals operator is particularly important, because if it were switched, it would never run what is on the right side of the pipes. The ||=
operator is exactly like the ||
operator in that it will run what is to its right only if what is to its left evaluates to false
. This is what keeps our method from trying to create multiple keys again and again, and instead forces it to find an existing key first. The super cool thing about the ||=
operator is that it is actually assigning a new key value to an empty array, which cuts out a lot of extra lines we had in our first iteration!
Okay, so this second iteration has been a vast improvement. But I think it’s time for some serious refactoring magic. Are you ready? Okay. This entire method can be rewritten into a single, simple line:
1 2 3 4 5 6 7 |
|
Yup. I kid you not.
This is the magic of the Rails group_by
method, which collects an enumerable into sets, grouping it by the result of a block. This method takes a proc using the ampersand shortcut as an argument (which we started using last week!). The group_by
method is passed the symbol :genre
, which is an attribute on each Author
object, and corresponds to a genre
column in the authors
table. So, we are effectively grouping all of our Author
objects by the result of calling .genre
on each object. In other words, we’re grouping by the genre
attribute since the attribute corresponds to an attr_accessor
method in the class.
And now, if we call our sort_by_genre
class method, we get the exact data structure we were hoping for:
1 2 3 4 5 |
|
Hooray! Or maybe not hooray. Maybe instead of hooray, you feel like I did when I realized that you could refactor all of this into one line:
PROGRAMMING:
Write a 10-line method.
Feel proud when it works.
Find out you can write the same thing in a single line.
Cry in the corner.
— Vaidehi Joshi (@vaidehijoshi) May 20, 2015
Although it made me feel pretty silly, the actual process of writing the group_by
method from scratch was a really great learning experience. I ended up using the examples above in a talk I gave on refactoring at Red Dot Ruby Conference in Singapore last week. And I actually recreated the exact same functionality when I had to write my own groupBy
function in JavaScript for a React component I had to build! I guess that however deep of a rabbit hole refactoring can be, sometimes it’s exactly the right kind of deep dive you need to learn how something works under the hood. I still totally feel like that cat in the bathtub whenever I do it – but I’m kind of okay with that.
tl;dr?
- The
group_by
method takes a block, which it uses to group a collection of objects. Theeach_with_object
method takes an object as an argument, and a block which tells it how to sort the collection you call the method upon. - Read more on the
group_by
method in the Rails docs, which also has a great example! - Looking for another example of how to implement Rails’
group_by
in a view? Check out this blog post. - Did you know that Ruby also has a similar
group_by
method? It’s great when you want to return a hash where the keys are evaluated by a block!