Fix 16.5 upgrade failure due to unique_batched_background_migrations_queued_migration_version

Zendesk: https://gitlab.zendesk.com/agent/tickets/486680

Error:

PG::UniqueViolation: ERROR:  duplicate key value violates unique constraint "unique_batched_background_migrations_queued_migration_version"
DETAIL:  Key (queued_migration_version)=(20230721095222) already exists.

Details:

  1. !132264 (merged) added the unique index unique_batched_background_migrations_queued_migration_version in %"16.5 (expired)".
  2. !127212 (merged) created DeleteOrphansScanFindingLicenseScanningApprovalRules2 migration to enqueue 2 BBMs in the same milestone %"16.5 (expired)".
  3. Both (1) and (2) got merged on the same day (28th Sep).
  4. With the new unique index we can only enqueue one BBM per post-migration, but somehow DeleteOrphansScanFindingLicenseScanningApprovalRules2 went through fine in Gitlab.com's gstg and gprd.

Cause:

The above error can happen while upgrading to 16.5 (or directly to higher versions upto 16.8) with zero-downtime.

Reason:

DeleteOrphansScanFindingLicenseScanningApprovalRules2 enqueues 2 BBMs in the same migration and AddQueuedMigrationVersionToBatchedBackgroundMigrations added an unique index on queued_migration_version. The first migration's timestamp is 20230721095222 but it being a post-deployment migration, it will get executed after the second migration (20230921081527) if the customer has done zero-downtime upgrade.

To Unblock:

  1. Drop the unique index:

    DROP INDEX unique_batched_background_migrations_queued_migration_version;
  2. Re-run the upgrade, this should make the DeleteOrphansScanFindingLicenseScanningApprovalRules2 migration to pass without any error.

  3. After the successful upgrade, execute below SQL commands to introduce the unique constraint back.

    UPDATE batched_background_migrations SET queued_migration_version = NULL 
      WHERE job_class_name IN ('DeleteOrphansApprovalMergeRequestRules2', 'DeleteOrphansApprovalProjectRules2');
    
    CREATE UNIQUE INDEX unique_batched_background_migrations_queued_migration_version
      ON batched_background_migrations USING btree (queued_migration_version);

Note: You might have to restart gitLab with sudo gitlab-ctl restart on a Rails node if it doesn't come back on its own. Restarting Sidekiq nodes might be necessary as well if you notice hanging or failing background jobs.

Edited by Ben Prescott (ex-GitLab)