Upgrade didn't fail with incomplete migrations
Summary
User performs upgrade, reconfigure finishes without issues, but some regular DB migrations didn't actually run and have status down
.
Potential root cause as @stanhu
noted:
if
db:migrate
might have prematurely aborted for some reason with an exit code 0, and we aren't handling that properly
When Omnibus runs
gitlab-rake db:migrate
, it saves a file in/var/opt/gitlab/gitlab-rails/upgrade-status/db-migrate-*
that stores the exit code. Omnibus assumes the migrations are complete if that exit code is 0 (https://gitlab.com/gitlab-org/omnibus-gitlab/-/blob/ce43cc045a3eefa340c7f98e02c0becf8b5b1a45/files/gitlab-cookbooks/gitlab/libraries/rails_migration_helper.rb#L28).
Steps to reproduce
Issue is not consistent, team has seen several cases when migration didn't complete but exit code was 0.
One of the examples - it happened with this customer gitlab-environment-toolkit#856 (comment 1818546335) - when upgrading from 16.6.4->16.7.7. As seen migration is marked as down 20231114100444 Add can create organization to application settings
- but reconfigure and all previous upgrade steps passed for them.
What is the current bug behavior?
Upgrade exits with 0 even though some migrations are in down
state
What is the expected correct behavior?
Upgrade exits with 1 if any migrations are in down
state
Relevant logs
Being asked in gitlab-environment-toolkit#856 (comment 1843963172)