Migration differs between .com and self-managed.
Context
!100609 (merged) added a column using a post-migration which caused problems when tagging and deploying the release candidate for 15.5: pods had to be restarted to account for this new column.
Slack conversation details
Slack link - Internal only
Valery Burton
- Looks like this was the MR that introduced default_compliance_framework_id, which should also be a db column, so seems like this could be similar to the last incident: !100609 (merged) For the last incident, I remember sidekiq , web, and api pods were all restarted, and the issue was reproducible on both the UI & API. I ran the failing test after sidekiq and web restart and the test was still failing, but restarting the api did the trick
Ahmad
- A restart to the web pods seems to make it work for me, I guess we can restart the rest for good measure
Ahmad
- OK, web, api and sidekiq pods are restarted
Nailia Iskhakova
- Passed
Amy Phillips
- Does this mean all self-managed users will need to restart after installing?
To ensure the problem didn't affect self-managed instances and to continue with the release steps preparation, the migration was moved from a post-migration to a regular one on !101658 (merged). Unfortunately, the MR didn't go through due to a failure in the rspec fail-fast
job:
Failed examples:
rspec ./spec/migrations/change_public_projects_cost_factor_spec.rb:48 # ChangePublicProjectsCostFactor#down when on SaaS resets the cost factor to 0 only for shared runners that were updated
The problem was related to an schema refreshing problem and a solution was submitted/merged on !101613 (merged), but the MR failed again with a different failure:
1303) Migrations Validation migration: #<struct ActiveRecord::MigrationProxy name="CleanupOrphansApprovalProjectRules", version=20220411173544, filename="/builds/gitlab-org/gitlab/db/post_migrate/20220411173544_cleanup_orphans_approval_project_rules.rb", scope=""> uses one of the allowed migration classes
Failure/Error: super(levels&.map { |level| Gitlab::VisibilityLevel.level_value(level) })
NoMethodError:
super: no superclass method `restricted_visibility_levels=' for #<ApplicationSetting >
To unblock the release candidate, the commit on !101658 (merged) was directly cherry-picked into the stable branch and the pipeline succeded there
Problem
Although cherry-picking the commit into the stable branch unblocked the release preparation, it created a divergence between self-managed and SaSS:
- Self-managed will execute the migration as a regular one
- GitLab.com executed the migration as a post-migration
Although this divergence should impose a problem (I think), we should fix this problem. The purpose of this issue is to continue investigating why !101658 (merged) continues to fail and fix the divergence