Skip to content

Proposal: Always expect no-downtime migrations

Context

A downtime check process introduced 4 years ago allows for migrations to take GitLab installations offline. gitlab-foss!4911 (merged) / gitlab-foss#14545 (closed). From the issue, the goals of allowing downtime in migrations were:

  • To help Release Managers identify which migrations require downtime, and,
  • To notify the developers when a change might require downtime.

From a quick git search, the last migration that require downtime was introduced 3 years ago.

Problem

In our current development/infrastructure setup, we don't have any strategy in place if a migration requires downtime:

  • If a merge request with DOWNTIME = true is opened CI pipelines don't fail.
  • Release Managers no longer verify if a migration requires downtime, they assume all migrations are zero-downtime migrations.
  • When deploying to GitLab.com, the deployer pipeline doesn't consider this into account.
  • Furthermore we have several strategies in place that help us to have zero-downtime migrations

With our continuous delivery model, the MTTP and GitLab.com availability goals, there's shouldn't be any case for a migration to require downtime for GitLab.com nor for self-hosted instances.

Proposal

Remove the checking around DOWNTIME=true, the constant from the migration template and also update the documentation to state that all migrations should be zero-downtime migrations

  1. Remove DowntimeCheck rake task
  2. Remove the rake task check from CI
  3. Remove the DOWNTIME constant from the template used to generate migrations
  4. Update development docs to make it clear we don't allow downtime and remove mention of approval process for downtime
  5. announce this in #backend, #development, #quality and add an item about this in the Engineering Week in Review.
Edited by Dylan Griffith