Skip to content

Store number of affected rows in metrics for batched background migrations

What does this MR do and why?

Update Gitlab::Database::BackgroundMigration::BatchMetrics and add #instrument_operation. This new method will store not only execution time for sub-batches, but also number of the affected records, assuming that these are returned from the background migration.

This is needed so that we can improve automatic batch size optimization, and take into account not only duration, but also work done.

#351786

How to set up and validate locally

  1. Create batched background migration like
# frozen_string_literal: true

# lib/gitlab/background_migration/dummy_bbm.rb
module Gitlab
  module BackgroundMigration
    class DummyBbm < BaseJob
      include Gitlab::Database::DynamicModelHelpers

      def perform(start_id, end_id, batch_table, batch_column, sub_batch_size, pause_ms)
        parent_batch_relation = relation_scoped_to_range(batch_table, batch_column, start_id, end_id)

        parent_batch_relation.each_batch(column: batch_column, of: sub_batch_size) do |sub_batch|
          batch_metrics.instrument_operation(:update_all) do
            sub_batch.where('id % 2 = 0').update_all('id = id')
          end
        end
      end

      def batch_metrics
        @batch_metrics ||= Gitlab::Database::BackgroundMigration::BatchMetrics.new
      end

      private

      def relation_scoped_to_range(source_table, source_key_column, start_id, stop_id)
        define_batchable_model(source_table, connection: connection).where(source_key_column => start_id..stop_id)
      end
    end
  end
end

  1. In rails console execute the following
helpers = ActiveRecord::Migration.new.extend(Gitlab::Database::MigrationHelpers)

helpers.queue_batched_background_migration(
  'DummyBbm',
  :projects,
  :id,
  job_interval: 1,
  batch_size: 100,
  max_batch_size: 100,
  sub_batch_size: 5
)

active_migration = Gitlab::Database::BackgroundMigration::BatchedMigration.find 12 # Use the id returned from the previous command

Gitlab::Database::BackgroundMigration::BatchedMigrationRunner.new.run_entire_migration(active_migration)

pp active_migration.batched_jobs.map(&:metrics)
[{"timings"=>
   {"update_all"=>
     [0.00703599996631965,
      0.001476000004913658,
      0.0024839999969117343,
      0.0013969999854452908,
      0.0019730000058189034]},
  "cmd_tuples"=>{"update_all"=>[2, 3, 2, 3, 0]}}] # <- number of rows updated for each sub-batch

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #351786

Edited by Krasimir Angelov

Merge request reports