Skip to content

Support multi-version GitLab upgrade path in Operator

Summary

The GitLab Operator currently sets SKIP_POST_DEPLOYMENT_MIGRATIONS in the pre-migration job and always splits pre and post-deployment migrations. This enforces a ZDU (Zero Downtime Upgrade) migration pattern without explicitly calling this enforcement in the upgrade guidance and that can cause migration failures when jumping versions, as demonstrated in gitlab-org/gitlab#578331 (closed).

Current behavior

When upgrading GitLab using the Operator (e.g., from 18.3.5 to 18.5.1):

  1. The Operator sets SKIP_POST_DEPLOYMENT_MIGRATIONS=true in the pre-migration job
  2. Regular migrations run first (18.4 → 18.5 regular migrations)
  3. Webservice/Sidekiq restart
  4. Post-deployment migrations run later (18.4 → 18.5 post migrations)

This creates a problematic migration order when skipping versions:

  • 18.4 regular migrations → 18.5 regular migrations → 18.4 post migrations → 18.5 post migrations

However, the expected order for a downtime upgrade should be:

  • 18.4 regular → 18.4 post → 18.5 regular → 18.5 post (all at once)

Problem

The split migration approach can cause failures when:

  • A post-deployment migration in version N drops/modifies data
  • A regular migration in version N+1 expects that data to exist
  • Example: 18.4 post-deployment migration drops a table, but 18.5 regular migration expects it to exist

Expected behavior

The Operator should either:

  1. Document requirement to follow minor upgrades at a time when using Operator
  2. Split multi-version upgrades to ZDU mode or intoduce non-ZDU mode for such paths

Versions

  • Operator: All versions

Related discussions

Edited by Nailia Iskhakova