Skip to content

Adjust BBM job efficiency when only part of the batch was affected

Krasimir Angelov requested to merge 351786-bbm-adjust-job-efficiency into master

What does this MR do and why?

When data distribution is not even, i.e. iterating the whole table but updating just some rows based on a filter, the batch optimizer can get confused and scale up the batch size to an unsafe level.

This MR adjusts the job efficiency for such batches, based on the number of affected rows, in order to avoid this problem.

The adjusted efficiency is capped to 0.95, so that jobs with higher efficiency (this happens when very low number or rows, or no rows at all were affected), does not result in decreasing the batch size.

#351786

Screenshots or screen recordings

The data for the charts bellow was generated with the following migration which attempts to simulate the problem described in #351786.

module Gitlab
  module BackgroundMigration
    class DummyBbm < BaseJob
      include Gitlab::Database::DynamicModelHelpers

      def perform(start_id, end_id, batch_table, batch_column, sub_batch_size, pause_ms, rest)
        model = define_batchable_model(batch_table, connection: connection)

        batch_metrics.instrument_operation(:update_all) do
          case start_id
          when 100_000..3_000_000 # simulate batches with no work done
            sleep(0.5)
            0
          else
            sleep(2.5)
            end_id - start_id
          end
        end
      end

      def batch_metrics
        @batch_metrics ||= Gitlab::Database::BackgroundMigration::BatchMetrics.new
      end
    end
  end
end

This iterates over the following table

create table to_migrate (
  id bigint primary key,
  type smallint
)

insert into to_migrate(id) select i from generate_series(1, 5000000) as i;

Before

We can see that batch size is constantly increasing:

image

After

Here we see that batch size is not changed for the batches that have done no work:

image

Here are the raw CSV files used to generate these charts:

How to set up and validate locally

  1. To execute the above migration first apply the following changes
diff --git a/lib/gitlab/database/background_migration/batched_migration_runner.rb b/lib/gitlab/database/background_migration/batched_migration_runner.rb
index 59ff9a9744f..a6d5c865acc 100644
--- a/lib/gitlab/database/background_migration/batched_migration_runner.rb
+++ b/lib/gitlab/database/background_migration/batched_migration_runner.rb
@@ -129,6 +129,7 @@ def run_migration_while(migration, status)
           while migration.status_name == status
             run_migration_job(migration)
 
+            migration.reload
             migration.reload_last_job
           end
         end
diff --git a/lib/gitlab/database/migrations/batched_background_migration_helpers.rb b/lib/gitlab/database/migrations/batched_background_migration_helpers.rb
index 0261ade0fe7..3d38037c3fb 100644
--- a/lib/gitlab/database/migrations/batched_background_migration_helpers.rb
+++ b/lib/gitlab/database/migrations/batched_background_migration_helpers.rb
@@ -77,7 +77,7 @@ def queue_batched_background_migration( # rubocop:disable Metrics/ParameterLists
             return
           end
 
-          job_interval = BATCH_MIN_DELAY if job_interval < BATCH_MIN_DELAY
+          # job_interval = BATCH_MIN_DELAY if job_interval < BATCH_MIN_DELAY
 
           batch_max_value ||= connection.select_value(<<~SQL)
             SELECT MAX(#{connection.quote_column_name(batch_column_name)})
  1. In rails console execute
helpers = ActiveRecord::Migration.new.extend(Gitlab::Database::MigrationHelpers)

helpers.queue_batched_background_migration(
  'DummyBbm',
  :to_migrate,
  :id,
  Time.now.to_i,
  job_interval: 5.seconds,
  batch_size: 1_000,
  sub_batch_size: 100
)

active_migration = Gitlab::Database::BackgroundMigration::BatchedMigration.find ? # Use the id of the migration created above

Gitlab::Database::BackgroundMigration::BatchedMigrationRunner.new(connection: helpers.connection).run_entire_migration(active_migration)

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Krasimir Angelov

Merge request reports