Seed Data In Ruby On Rails


Creative Commons License mrpbps

To run automated tests for your Ruby on Rails webapp, not only do you need your latest database structure deployed to the test database (created by rake db:test:prepare), but you also need some seed data for lookup tables, e.g. like zip codes.

Common approaches like adding seed data through rails migrations are discouraged, and plugins like seed_fu only work for small amounts of seed data. In seed_fu, you can specify a seed method for your ActiveRecord models like so:

User.seed(:login, :email) do |s|
      s.login = "bob"
      s.email = "bob@bobson.com"
      s.first_name = "Bob"
      s.last_name = "Bobson"
    end

Running the rake db:seed task provided by seed_fu will add all defined models to your test database.

DHH has even standardized a way to load seed data for Rails 3, making the rake db:seed task part of rails and setting up a file called db/seeds.rb for maintaining your seeding code. Using that file, you can load your seed data however you see fit, e.g. seed_fu.

How to Deal With Big Amounts of Seed Data


So far, so good. There are ways to load seed data into your rails test database using Ruby code. But what if, like in our case, you have to seed more than 60,000 Points of Interest and over 16,000 cars? We definitely don’t want to write Ruby code for each of them. The only sane way of handling such amounts of data are database dumps. So I added my own rake db:seed:dump and rake db:seed:load tasks to our Rails 2.3.2 application. As soon as we move to Rails 3, we can call the load task from within db/seeds.rb.

Short and sweet (and completely MySQL specific and dependent on MySQL living in your path 😉 ) here are my two rake tasks:

namespace :db do
  namespace :seed do
    require 'db/seed_tables'
    
    desc "dump the tables holding seed data to db/RAILS_ENV_seed.sql. SEED_TABLES need to be defined in config/environment.rb!!!"
    task :dump => :environment do
      config = ActiveRecord::Base.configurations[RAILS_ENV]
      dump_cmd = "mysqldump --user=#{config['username']} --password=#{config['password']} #{config['database']} #{SEED_TABLES.join(" ")} > db/#{RAILS_ENV}_seed.sql"
      system(dump_cmd)
    end

    desc "load the dumped seed data from db/development_seed.sql into the test database"
    task :load => :environment do
      config = ActiveRecord::Base.configurations['test']
      system("mysql --user=#{config['username']} --password=#{config['password']} #{config['database']} < db/#{RAILS_ENV}_seed.sql")
     end
  end
end

Note that I use a file called db/seed_tables.rb to define, which tables shall be dumped. It just holds an array of table names like so:

SEED_TABLES = [
  "auxilary_services",
  "background_informations",
  "pois"
]

Using two basic rake tasks and database dumps eases the pain of handling test data for us. How do you manage your test data? Let us know in the comments!

12 thoughts on “Seed Data In Ruby On Rails

  1. Hey thanks for the writeup! I’ve been using shell scripts to do the same, but this is a nicer way to handle it.

    You should consider making this a Rails plugin.

    Like

  2. In rails 2.3.8 you don’t need the load task anymore. You can just drop

    config = ActiveRecord::Base.configurations[RAILS_ENV]
    system("mysql --user=#{config['username']} --password=#{config['password']} --host=#{config['host']} #{config['database']} < db/#{RAILS_ENV}_seed.sql") 
    

    into the given db/seeds.rb file and run:

    rake db:seed
    

    or use rake db:setup to create your database and load the seed data from your SQL file

    Like

  3. Can this technique also be used to upload data to a Rails hosting company, like Heroku? Can you give me a little hint as to how I would seed my “categories” table to my database up on Heroku?

    Like

  4. Matt,

    if you add your db/#{RAILS_ENV}_seed.sql file to git and push it you should be able to run $ heroku rake db:seed from your box. I did not try it myself so it would be great to hear, if it works for you.

    Like

  5. I got sick of having seed data and then layering rake tasks for environmental data on top so put together seedbank which gives you common seeds under db/seeds/*.seeds.rb and seeds for your enviroment under db/seeds/ENV/*.seeds.rb

    This gives me the ability to have an entire working db in place just using;
    $ raked db:setup

    This will load all the common seeds and my development environment seeds in one go.

    Like

  6. Yep. It’s right there on github. I didn’t need anything fancy like loading SQL files and had some need to use factories when large amounts of data needed creating so extending the seeds.rb approach worked best for me.

    Like

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.