Ensure half complete batched background migrations are correctly picked up after the CI decomposition failover
Read more about the failover process at https://about.gitlab.com/handbook/engineering/development/enablement/sharding/migrate-ci-tables-to-new-database-plan.html#rolling-back . The TL;DR is that we're replicating the whole database to a separate ci database and at a certain point we'll stop the replication and split our reads and writes for all ci related tables as defined by https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/database/gitlab_schemas.yml to go to the new CI patroni cluster. Since background migrations are considered gitlab_shared tracking tables then this presents a tricky situation where a half complete migration will effectively be split across 2 databases after the failover. We need to ensure:
- We don't duplicate our background migration efforts running the migration wastefully against the wrong database with stale data
- We don't miss a background migration after the failover
- We don't run the background migration against stale data on the wrong database
Proposal: Store the allowed_schemas_for_connection alongside the background migration data and ensure it's picked up by the correct worker before/after the migration
From #359951 (comment 928357451)
Since the background migrations are always scheduled from regular migration, with the new Migration[2.0] we do know the schema from which is being scheduled (like gitlab_main):
- We could store this information in tracking table.
- The worker then would only query background tasks for
where(gitlab_schema: [allowed_schemas_for_connection]). - Now, since we will enqueue background migrations ahead of time with
Migration[2.0]it would hold a correctgitlab_schema, since we already know the schema that is used for querying data - Now, once we split the databases the new CI Worker would only fetch the migrations with the correct schema.
- The one outcome would be some stale records with schema outside of a given database, but this could be cleaned up in a follow-up phase when we will be cleaning up unrelated data
- Any background migrations that need to be run on
gitlab_sharedtables will need to be scheduled twice (once forciand once formain) but it may be simpler before CI decomposition to simply forbid any migration that needs to backfill/update data ongitlab_sharedtables as I assume this is a rare edge case we probably won't actually need to do in the next couple of months as these are only a few tables