Adjust BBM job efficiency when only part of the batch was affected
What does this MR do and why?
When data distribution is not even, i.e. iterating the whole table but updating just some rows based on a filter, the batch optimizer can get confused and scale up the batch size to an unsafe level.
This MR adjusts the job efficiency for such batches, based on the number of affected rows, in order to avoid this problem.
The adjusted efficiency is capped to 0.95, so that jobs with higher efficiency (this happens when very low number or rows, or no rows at all were affected), does not result in decreasing the batch size.
Screenshots or screen recordings
The data for the charts bellow was generated with the following migration which attempts to simulate the problem described in #351786.
module Gitlab
module BackgroundMigration
class DummyBbm < BaseJob
include Gitlab::Database::DynamicModelHelpers
def perform(start_id, end_id, batch_table, batch_column, sub_batch_size, pause_ms, rest)
model = define_batchable_model(batch_table, connection: connection)
batch_metrics.instrument_operation(:update_all) do
case start_id
when 100_000..3_000_000 # simulate batches with no work done
sleep(0.5)
0
else
sleep(2.5)
end_id - start_id
end
end
end
def batch_metrics
@batch_metrics ||= Gitlab::Database::BackgroundMigration::BatchMetrics.new
end
end
end
end
This iterates over the following table
create table to_migrate (
id bigint primary key,
type smallint
)
insert into to_migrate(id) select i from generate_series(1, 5000000) as i;
Before
We can see that batch size is constantly increasing:
After
Here we see that batch size is not changed for the batches that have done no work:
Here are the raw CSV files used to generate these charts:
How to set up and validate locally
- To execute the above migration first apply the following changes
diff --git a/lib/gitlab/database/background_migration/batched_migration_runner.rb b/lib/gitlab/database/background_migration/batched_migration_runner.rb
index 59ff9a9744f..a6d5c865acc 100644
--- a/lib/gitlab/database/background_migration/batched_migration_runner.rb
+++ b/lib/gitlab/database/background_migration/batched_migration_runner.rb
@@ -129,6 +129,7 @@ def run_migration_while(migration, status)
while migration.status_name == status
run_migration_job(migration)
+ migration.reload
migration.reload_last_job
end
end
diff --git a/lib/gitlab/database/migrations/batched_background_migration_helpers.rb b/lib/gitlab/database/migrations/batched_background_migration_helpers.rb
index 0261ade0fe7..3d38037c3fb 100644
--- a/lib/gitlab/database/migrations/batched_background_migration_helpers.rb
+++ b/lib/gitlab/database/migrations/batched_background_migration_helpers.rb
@@ -77,7 +77,7 @@ def queue_batched_background_migration( # rubocop:disable Metrics/ParameterLists
return
end
- job_interval = BATCH_MIN_DELAY if job_interval < BATCH_MIN_DELAY
+ # job_interval = BATCH_MIN_DELAY if job_interval < BATCH_MIN_DELAY
batch_max_value ||= connection.select_value(<<~SQL)
SELECT MAX(#{connection.quote_column_name(batch_column_name)})
- In rails console execute
helpers = ActiveRecord::Migration.new.extend(Gitlab::Database::MigrationHelpers)
helpers.queue_batched_background_migration(
'DummyBbm',
:to_migrate,
:id,
Time.now.to_i,
job_interval: 5.seconds,
batch_size: 1_000,
sub_batch_size: 100
)
active_migration = Gitlab::Database::BackgroundMigration::BatchedMigration.find ? # Use the id of the migration created above
Gitlab::Database::BackgroundMigration::BatchedMigrationRunner.new(connection: helpers.connection).run_entire_migration(active_migration)
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.