Writing Smart Migrations: References, Reversible, and Indexes

All it takes is a single feature to make you realize how well – or perhaps not so well – you’ve written your application. Depending on if you’ve done a good job of separating concerns, abstracting and encapsulating bits of functionality into cohesive code, and just generally not repeating yourself…well, you’ll probably be way more enthusiastic about adding a huge new feature to an already massive application. But if you haven’t done any of those things…well, you might find yourself doing a massive rewrite of your codebase just to implement a single feature.

Luckily, the application I started adding a feature to last week is pretty well-built, which makes it flexible enough to add new functionality relatively easily. Yet even the most well-thought-out applications need to be teased apart and glued back together during the course of their lives. Often times, a big part of building out a new feature is restructuring the architecture of your application. When you’re working with larger code bases (legacy code in particular), that can be the most daunting task. But a lot of the intimidation of that begins to dissipate once you know how to go about restructuring a schema which already exists without breaking all the the things.

The first step in this process is writing migrations to change how the objects in your application and database relate to one another. Of course, knowing how to write the proper migration to suit your needs is just half the battle; the other half is writing a migration that makes object lookup in your database efficient and super fast. Even though I’ve written about migrations before, I’m learning that there’s always so much more to know. So, let’s get learning!

Referencing Your References

A really quick way to clean up our code is by changing the way that we create associations between objects from within our migrations. One feature we’ll want to add is to have our Books belong to a Genre. So far, we’ve categorized our Authors by genre, but we’ll want to be able to sort and organize our Book objects in a similar way.

We often use belongs_to and has_many inside of our migrations, which is totally valid. But in this case, it probably makes more sense to use something called references. According to the Rails source code, references is just an alias of belongs_to. But, a Book doesn’t really “belong to” a Genre, it’s more that it should be able to “reference” the genre that it’s associated with. We can add a reference fairly easily just like this:

class AddGenreToBooks < ActiveRecord::Migration
  def change
    add_reference :books, :genre
  end
end

This will create a genre_id column inside of our Books table. We could also have done this initially inside of a change method, and written something like t.references(:genre) inside of our migration method. We also have the option to add a type, and specify whether the reference is a polymorphic association or not, as explained by the Rails API:

references: Adds a reference. Optionally adds a type column, if :polymorphic option is provided. The reference column will be an integer by default, the :type option can be used to specify a different type. A foreign key will be created if a foreign_key option is passed.

Using references instead of a typical has_many and belongs_to gives us a bit more flexibility moving forward, especially if we know that we’ll be adding different types of associations in the long run.

Reverse, reverse!

One big feature we want to add to our Bookstore app is giving our Books a rating. There’ll be some slightly complicated logic on the backend for this feature to actually work, since we’ll need to update a Book’s rating every single time a User rates it. But, let’s not worry too much about how that logic will be implemented just yet. Instead, it’ll be more helpful to break down this larger problem into a few smaller, bite-sized pieces.

First things first: do what we know. And we know that we’ll need to give every Book a rating attribute. We’ll want this to use Ruby BigDecimal, so we’ll specify that in our migration. This migration will start out pretty simple, since all we want to do is just give our Books a new attribute:

class AddRatingColumnToBooks < ActiveRecord::Migration
  def change
    add_column :books, :rating, :decimal
  end
end

If we run rake db:migrate, and then open up our console, we can very quickly check that this added a column, just as we expected:

♥ rails c
> Book.all.pluck(:rating)
# => [nil, nil, nil, nil, nil, nil]

Uh oh, that’s not so great. We really don’t want all of the Books that are already inside of our database to have nil values for their rating. Honestly, a more accurate representation of their rating would just be 0, since none of our Users have actually given these products a rating yet.

What would be really nice is if we could do two things within this single migration: give our Books a rating column (done!), and set all of our pre-existing Books to have a rating of 0. How can we do that? Well, by using reversible, of course!

We already know that we can use the up and down methods to migrate “up” and “down” the history or “life” of our database. But there’s also another handy method that can help us out here, called reversible. Here’s what the Rails Guides say about reversible:

Complex migrations may require processing that Active Record doesn’t know how to reverse. You can use reversible to specify what to do when running a migration and what else to do when reverting it. Using reversible will ensure that the instructions are executed in the right order too.

So how can we implement reversible here? Well, to start, we need a reversible block, which takes a direction. If you’re thinking that the direction should be up or down, you’d be right! Since we’re only adding a column in this migration, we don’t need to worry about a down method, since migrating down will automatically just drop the rating column that we’re adding. What we do need to do is make sure that when we migrate up, we’re also updating our current Book records. We can do that by specifying the up method on our direction, and then pass a block to reversible telling it what to do:

class AddRatingColumnToBooks < ActiveRecord::Migration
  def change
    add_column :books, :rating, :decimal

    reversible do |direction|
      direction.up { Book.update_all(rating: 0.0) }
    end
  end
end

Cool, this migration seems great. Now, we’re not only adding a rating column to our Books table, but we’re also updating all of our pre-existing Book objects so that they have some default value for their rating. Nothing about this should break ever, right?

Wrong! Why? Because we can rename anything we want at any time…and then forget to change all of our files. When we decide to start selling music or movies and go full-on Amazon in our application, we’ll probably want to rename our Books table to something like Product or Item. We’ll create a new migration to rename our table, change our model names, and everything will be fine. Except for one teeny, tiny little problem: when we decide to deploy to Heroku and try to run rake db:migrate, everything will explode. And why might that happen? Well, because we renamed our table, and now we’re trying to run a migration on a Books table for a Book model, which, at this moment in time, doesn’t even exist.

So, how can we fix this? Well, we can actually define a Book model inside of this very migration file. And, we can do it all in a single line:

class AddRatingColumnToBooks < ActiveRecord::Migration
  class Book < ActiveRecord::Base; end

  def change
    add_column :books, :rating, :decimal

    reversible do |direction|
      direction.up { Book.update_all(rating: 0.0) }
    end
  end
end

Nice! Now, even if we forget to rename our old migrations, we can be sure that ActiveRecord won’t complain if it can’t find a Book model when this migration tries to run. In fact, it will always find a model no matter what we might rename our model or table to in the future because we’ve defined an empty Book class inside of this migration file. We can be sure that this is the case because Ruby will first look for a class declaration inside of a file before searching anywhere else!

Index All The Things

Okay, time to get efficient. Let’s look back at our first migration, where we were adding Genres to Books:

class AddGenreToBooks < ActiveRecord::Migration
  def change
    add_reference :books, :genre
  end
end

This is fine, but there’s one slight problem here: what if we only have a few Genres, but a ton of Books? When we want to run a method like where or sort_by, how will our application handle this? Well, without even testing out our query, we know it’s not going to go so well. And why not? Because, as it stands, our application will have to sort through every single one of our books to find the ones that match our query. And after a couple hundred or thousands of books, this is going to massively slow down our application.

But, we’re smart enough to foresee that problem! We’ll outsmart our application from the get-go, and just write a smart migration to begin with:

class AddGenreToBooks < ActiveRecord::Migration
  def change
    add_reference :books, :genre, index: true
  end
end

If we open up our db/structure.sql, we’ll see this:

CREATE TABLE books (
  rating numeric(10,2),
  genre_id integer
);

CREATE INDEX index_books_on_genre_id ON books USING btree (genre_id);

This created an index on our Books table, which is referenced to Genre. This is pretty amazing, because it means that our table will now use integers (not strings or any other data type) to look up an item on our table. Integer lookup is the cheapest and fastest way to look up an item in a database, which means that it’s going to speed up our database like crazy!

We also could have just written a completely new migraiton for this as well, which would implement the add_index method:

class AddGenreIndexToBooks < ActiveRecord::Migration
  def change
    add_index(:books, :genre)
  end
end

We could also use the add_index method to create unique indexes, or named indexes, as highlighted in the documentation. Sometimes, it’s actually safer to use the add_index method rather than index: true.

Now our migrations are not just clear and concise, but they’re also helping our application run smoother and quicker in the process. Writing smart migrations can not only save us a lot of time down the road, but they make feature-writing a lot more fun! Not as fun as running wildly through the safari, but hey – it’s a close second:

tl;dr?

Using references in a migration is the same as belongs_to, but can make your code more clear. The reversible method is great for specifying how your migration should handle a reverse migration. Indexes are an easy way of adding an iteger column to speed up database queries.
Need to see another example of manipulating data from within a migraiton? This blog post walks through some great examples using the up and down methods.
Check out this great post on how to implement reversible in different situations!

Words and Code

One writer’s journey from words to code.

Writing Smart Migrations: References, Reversible, and Indexes

Referencing Your References

Reverse, reverse!

Index All The Things

tl;dr?