Skip to content

Seed dev with production like data by default

What

Use postgres generate_series as described in https://gitlab.com/gitlab-org/gitlab-ce/snippets/33946#note_23319454 to insert millions of rows into dev databases during initial setup.

Why

The performance of dev instances is so far off of production that it makes it hard to properly consider performance. Having production-like data on dev isn't as good as the real thing, but allows for much quicker iteration on performance problems and makes some types of issue less likely to slip through.

Considerations

  • Using generate_series via the active_record-pg_generate_series gem is an order of magnitude faster than bulk_insert, which in turn is an order of magnitude faster than doing this in rails.
  • Data must be similar enough to production that it forces similar query plans.
  • Seeds must be kept up to date
  • As creating rows with generate_series bypasses rails, relations like ProjectFeature do not get created.
  • Which tables do we need to seed? How many rows are there in production? Which values do typical queries check for?
  • While some things will be improved just by having data in the tables, other things will need related data to actually be used. E.g. having a large number of issues with labels in the projects we actually have open, and not viewing things as admin

Related