Skip to content

WIP: Exportable developer seed environment

James Edwards-Jones requested to merge jej/shared-database-seed into master

What

System for seeding large volumes of data and sharing the result in the form of a backup, as suggested in https://gitlab.com/gitlab-org/gitlab-ce/issues/28149#note_44553225

Outline

  1. Seeds database using both SeedFu and postgres generate_series
  2. Create backup tar and upload to S3
  3. Developers can restore from this backup, first checking out the commit used for the backup
  4. Run the above on a regular basis

Why

Allow for large amounts of data in dev database, in a way which can be kept up to date

The performance of dev instances is so far off of production that it makes it hard to properly consider performance. Having production-like data on dev isn't as good as the real thing, but allows for much quicker iteration on performance problems and makes some types of issue less likely to slip through.

Todo in this MR

  • Implement basic seed using generate_series
  • Find way to run this on a weekly basis
  • Create job which will run the rake task
  • Get help setting up long running CI runners for this
  • Upload to S3 the commit sha used for the backup

Todo, possibly in further MRs / issues

  • Increase number of models seeded by SeedFu
  • Automate restoration process
    • Override naming convention for backups
    • Check how old backups are managed
    • Create rake task which downloads the tar, checks out the right commit, and restores the backup

Are there points in the code the reviewer needs to double check?

Acceptance criteria

Related

Edited by James Edwards-Jones

Merge request reports