Skip to content

Add the BG migration worker for the ci database

Patrick Bair requested to merge 343047-add-ci-database-bg-migration-worker into master

What does this MR do and why?

Related to #343047 (closed)

Adds a new worker to process background migrations that target the ci database.

Since the worker enqueues jobs based on the database name, in a single database setup all background migrations will continue to be executed by BackgroundMigrationWorker (database name will always be main). This simplifies management of background migrations for self-managed customers, who won't operate decomposed databases.

The job won't be used yet on GitLab.com either, at least until we have migration tooling to run DML within the gitlab_schema. At that point, we could enable the new working by updating https://gitlab.com/gitlab-org/gitlab/-/blob/a4e148198fe318e4caaab61efac727efa42e3d0b/lib/gitlab/database/migrations/background_migration_helpers.rb#L190 to use the current connection name of the migration.

How to set up and validate locally

In a multi-database rails console, like GITLAB_USE_MODEL_LOAD_BALANCING=true rails c

  1. Create a test migration job:
    module Gitlab
      module BackgroundMigration
        class MyTestMigration < BaseJob
          def perform(start_id, stop_id, table_name)
            num_rows = connection.select_value("select count(*) from #{table_name}")
            puts "#{num_rows} rows in #{table_name} on #{connection.pool.db_config.name} database"
            mark_jobs_as_succeeded(start_id, stop_id, table_name)
          end
    
          private
    
          def mark_jobs_as_succeeded(*arguments)
            Gitlab::Database::BackgroundMigrationJob.mark_all_as_succeeded(self.class.name.demodulize, arguments)
          end
        end
      end
    end
  2. Schedule a job on both the main and ci database:
    main_coordinator = Gitlab::BackgroundMigration.coordinator_for_database('main')
    ci_coordinator = Gitlab::BackgroundMigration.coordinator_for_database('ci')
    
    main_coordinator.perform_in(1.hour, 'MyTestMigration', [1, 100, 'projects'])
    Gitlab::Database::SharedModel.using_connection(Project.connection) do
      Gitlab::Database::BackgroundMigrationJob.create!(class_name: 'MyTestMigration', arguments: [1, 100, 'projects'])
    end
    
    ci_coordinator.perform_in(1.hour, 'MyTestMigration', [1, 100, 'ci_builds'])
    Gitlab::Database::SharedModel.using_connection(Ci::Build.connection) do
      Gitlab::Database::BackgroundMigrationJob.create!(class_name: 'MyTestMigration', arguments: [1, 100, 'ci_builds'])
    end
  3. Verify the scheduled set and tracking records:
    Sidekiq::ScheduledSet.new.select { |j| j.args.first == 'MyTestMigration' }
    
    select status, arguments from background_migration_jobs where class_name = 'MyTestMigration';
    -- on main
     status |       arguments
    --------+-----------------------
          0 | [1, 100, "projects"]
    -- on ci
     status |       arguments
    --------+-----------------------
          0 | [1, 100, "ci_builds"]
  4. Run the jobs:
    main_coordinator.steal('MyTestMigration') # should output something like "8 rows in projects on main database"
    ci_coordinator.steal('MyTestMigration') # should output something like "4 rows in ci_builds on ci database"
  5. Verify the status is 1 on both main and ci tracking records:
    select status from background_migration_jobs where class_name = 'MyTestMigration';

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Patrick Bair

Merge request reports