Support multi-version GitLab upgrade path in Operator
Summary
The GitLab Operator currently sets SKIP_POST_DEPLOYMENT_MIGRATIONS in the pre-migration job and always splits pre and post-deployment migrations. This enforces a ZDU (Zero Downtime Upgrade) migration pattern without explicitly calling this enforcement in the upgrade guidance and that can cause migration failures when jumping versions, as demonstrated in gitlab-org/gitlab#578331 (closed).
Current behavior
When upgrading GitLab using the Operator (e.g., from 18.3.5 to 18.5.1):
- The Operator sets
SKIP_POST_DEPLOYMENT_MIGRATIONS=truein the pre-migration job - Regular migrations run first (18.4 → 18.5 regular migrations)
- Webservice/Sidekiq restart
- Post-deployment migrations run later (18.4 → 18.5 post migrations)
This creates a problematic migration order when skipping versions:
- 18.4 regular migrations → 18.5 regular migrations → 18.4 post migrations → 18.5 post migrations
However, the expected order for a downtime upgrade should be:
- 18.4 regular → 18.4 post → 18.5 regular → 18.5 post (all at once)
Problem
The split migration approach can cause failures when:
- A post-deployment migration in version N drops/modifies data
- A regular migration in version N+1 expects that data to exist
- Example: 18.4 post-deployment migration drops a table, but 18.5 regular migration expects it to exist
Expected behavior
The Operator should either:
- Document requirement to follow minor upgrades at a time when using Operator
- Split multi-version upgrades to ZDU mode or intoduce non-ZDU mode for such paths
Versions
- Operator: All versions
Related discussions
Edited by Nailia Iskhakova