Modify the Data in Your Rails App via Data Migrations

Although it's generally not a great idea to modify the data in your production database, you may have a few reasons for needing to do so. For example, you might be changing the status system that is used to describe a state of something in your app. Or, you need to delete some orphaned records to clean things up. These are 1-time tasks that we need to make sure are carried out on every environment. How can we make these changes as safely as possible?

Some people like to do this as a 1-time rake task. However, this may be dangerous if you have multiple environments. You would need to make sure to run this task in every single environment, and since it's only a 1-time thing, there will be some manual intervention involved in your production environment. And manual intervention is bad news, since it lends itself to human error.

What if I told you, there was a way to have a 1-time task run but it will be completely automated in every environment so there's way less likely of a chance for surprises in production? You can do this as part of your migrations! You already have migrations running every time you deploy, and you know that they'll only run once—there's the answer.

I'll give a simple example of how to do a "data migration." Let's say you have an order system where your order statuses are "received", "queued", "shipping", and "arrived." Now you decide that it doesn't make sense to have statuses of both "received" and "queued" since that distinction has been confusing to lots of people, there's no good business reason to have that distinction anyway, so you'd like to clean these statuses up and consolidate them both to just be "queued." Yay for clean and understandable code!

You'll need to generate a migration, just like the way you would generate any other normal migration: rails g update_statuses_on_orders. You'll get your regular migration file to work with, with a def up and a def down methods. Note that once you change all of the "received" statuses to "queued," you won't be able to undo it. This means you'll want to leave the def down method empty or write a comment inside of it #irreversible.

Next, we'll go ahead and write the ‘up' part of the migration:

class UpdateStatusesOnOrders < ActiveRecord::Migration


  def up
    Order.where(status: "received").each do |order|
      order.status = "queued"
      order.save
    end
  end

  def down
    #irreversible
  end
end

There's still something missing and can potentially break. What if one day in the future you decide to rename the orders table, and therefore its class. Somebody new will join your project, pull the latest code down, and try to run the migrations. They will fail; it will complain that there's no such class as Order anymore—even though at that point in the migrations the table is still called ‘orders.' There's a way to safeguard agains this happening; we need to add a class definition to the migration:

class UpdateStatusesOnOrders < ActiveRecord::Migration
  class Order < ActiveRecord::Base
  end


  def up
    Order.where(status: "received").each do |order|
      order.status = "queued"
      order.save
    end
  end

  def down
    #irreversible
  end
end

Note that if you use any model's custom method in your migration, you should redefine it in its class definition inside the migration. Methods are likely to go away or change frequently. It's best to stick as closely to the schema definition as possible since that's the one thing you can be sure of: the structure of your schema at the time of writing the migration.