There are a lot of opportunities in programming to feel pretty silly about yourself. I’m sure that the more coding experience you have, the less often these moments actually occur. But early on in your career, they seem to happen quite often – or they do to me, at least.
I had one of those moments a couple weeks ago, when I wrote a super long method and then asked a senior developer to take a look at it. While I was writing it, something seemed inherently wrong. I was sure there was a better way to do it, but I guess my Googling skills aren’t quite superb yet, because I couldn’t find quite the right answer anywhere. It was then that I decided to ask someone who would know exactly which tool to reach for.
What happened next was pretty awesome, albeit slightly depressing. I watched my code be refactored from ten lines down to a single line. It blew my mind – and not just because I didn’t know that this method even existed, but because I wanted to know how it worked! So, I did some digging and learned a bit about the method that I wrote which, as it turns out, already existed: the Rails
group_by method on Enumerables.
Data Is For Manipulating
I started off writing my super long method because I wanted to structure my data in a very specific way. In fact, we’ll probably want to structure the data in our Bookstore eCommerce app in a very similar way, too, so let’s use that as our working example.
For our admin panel, we want a list of
Author objects, categorized by
genre. Because our collection of
Books is going to grow extensively, it would be helpful for an admin to know which authors are included in a
genre or time period. Eventually, this could be used by admins to add new authors by a genre, to filter or sort by a genre, or to calculate an author count per genre, and figure out which authors to add to our collection of books.
Right now, our collection of
Author objects isn’t very big, but has just enough information for us to start implementing this functionality:
1 2 3 4 5
Even though our data is easy to read now, we can be sure that it isn’t going to stay that way. But we know that if we structure each of our objects correctly, we could have something simple, like this, in our view:
1 2 3 4 5
I’m a big fan of slim, which is what I’ve used above, but this view would still be pretty minimal when using another templating language such as erb.
Given that this is the view we want to render, we can use this information to structure our data. I’m thinking a hash is the tool for the job, with each key being a
genre name, and the value being an array of
Author objects that we can iterate through for each specific
It would be nice if we could call something like
Author.sort_by_genre and have it return a structure like this:
1 2 3
So now that we know what we want our data to look like, let’s write it the ugly way, just like I did!
The First Iteration
To start with, we know that we want to return a hash. So we can start by instantiating a hash, which will be our
authors_by_genre. We also know that we’ll need all the
Author objects in an array; since we’re specifically looking for an author’s
genre, we can query for those directly. And we can return our empty hash, since that will eventually be filled up:
1 2 3 4 5 6
Okay, so now we need to fill up our hash. We have our
authors variable, which is set to the collection of all
Author objects. We will need to iterate through all of them, and put that
Author in the correct array for the right
genre key. If the right
genre key doesn’t exist, we’ll need to create a key for that author’s
genre. We can accomplish this with another iteration. Now our method looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Okay…so hopefully, by this point, you should be thinking to yourself: This isn’t just ugly, it’s also super inefficient. And you’d be right. It’s pretty bad.
If you weren’t sure how bad it really is, think about all the queries we’re making! And how many times we’re iterating! As soon as I finished writing this and got it working, I knew it was definitely not the right approach. But the first step is to get it working. Now, we need to seriously consider how we can make it right and make it fast.
You know what what this means, right? Refactoring time.
Group All The Things
A quick way to refactor some of the messiness from our first iteration is by first changing how we initialize our hash object, and also modifying how we go about deciding whether to create a new key or add to a key that already exists.
each_with_object method! This is a pretty rad method I learned about while refactoring my first iteration of this
sort_by_genre method. The
each_with_object method requires a single argument: the object that you want to pass to it in each iteration. In our case, we’ll pass it a hash. And since the items we want to actually “categorize” are our
Author objects, we’ll call
each_with_object on our collection of
1 2 3 4 5 6 7
Now, what about that block – what goes inside? Well, we can think about what we want to do with each of our
Author objects that we’re iterating over. Inside of our hash, which we’re passing explicitly as an argument to
each_with_object, we want to either find the correct key and put the correct
Author into that array, or create a new key based on the current
Author object’s genre. We can write that quite nicely by using the
||= or equals operator, which will assign a new object, or whatever is to the right of the operator, equal to the left side of the pipes, or whatever is to the left of the operator:
1 2 3 4 5 6 7 8 9
Much better, right? In this second iteration, we’re passing a hash directly to the
each_with_object method, and basically telling it, Find the key in the hash I just passed you that is equal to this author’s genre. And if no such key exists, make one, set it equal to an empty array, and then put this current author into that array.
The order of our or equals operator is particularly important, because if it were switched, it would never run what is on the right side of the pipes. The
||= operator is exactly like the
|| operator in that it will run what is to its right only if what is to its left evaluates to
false. This is what keeps our method from trying to create multiple keys again and again, and instead forces it to find an existing key first. The super cool thing about the
||= operator is that it is actually assigning a new key value to an empty array, which cuts out a lot of extra lines we had in our first iteration!
Okay, so this second iteration has been a vast improvement. But I think it’s time for some serious refactoring magic. Are you ready? Okay. This entire method can be rewritten into a single, simple line:
1 2 3 4 5 6 7
Yup. I kid you not.
This is the magic of the Rails
group_by method, which collects an enumerable into sets, grouping it by the result of a block. This method takes a proc using the ampersand shortcut as an argument (which we started using last week!). The
group_by method is passed the symbol
:genre, which is an attribute on each
Author object, and corresponds to a
genre column in the
authors table. So, we are effectively grouping all of our
Author objects by the result of calling
.genre on each object. In other words, we’re grouping by the
genre attribute since the attribute corresponds to an
attr_accessor method in the class.
And now, if we call our
sort_by_genre class method, we get the exact data structure we were hoping for:
1 2 3 4 5
Hooray! Or maybe not hooray. Maybe instead of hooray, you feel like I did when I realized that you could refactor all of this into one line:
PROGRAMMING: Write a 10-line method. Feel proud when it works. Find out you can write the same thing in a single line. Cry in the corner.— Vaidehi Joshi (@vaidehijoshi) May 20, 2015
Although it made me feel pretty silly, the actual process of writing the
group_by method from scratch was a really great learning experience. I ended up using the examples above in a talk I gave on refactoring at Red Dot Ruby Conference in Singapore last week. And I actually recreated the exact same functionality when I had to write my own
group_bymethod takes a block, which it uses to group a collection of objects. The
each_with_objectmethod takes an object as an argument, and a block which tells it how to sort the collection you call the method upon.
- Read more on the
group_bymethod in the Rails docs, which also has a great example!
- Looking for another example of how to implement Rails’
group_byin a view? Check out this blog post.
- Did you know that Ruby also has a similar
group_bymethod? It’s great when you want to return a hash where the keys are evaluated by a block!