Adjust sub-batch size for failed Batched Background Migration Jobs
Merged
requested to merge 377308-adjust-the-sub_batch_size-in-background-migrations-if-we-get-a-query-timeout-exception into master
Reduces the sub_batch_size
from BatchedMigrationJob
when a timeout happens during sub batch processing.
It rescues the following exceptions:
ActiveRecord::StatementTimeout
ActiveRecord::ConnectionTimeoutError
ActiveRecord::AdapterTimeout
ActiveRecord::LockWaitTimeout
ActiveRecord::QueryCanceled
Solves #377308 (closed)
Feature Flag Issue: #393556 (closed)
If a timeout happens while processing each_sub_batch, a Gitlab::Database::BackgroundMigration::SubBatchTimeoutError
error will be raised.
This error will be rescued by the migration wrapper and processed by BatchedJob#reduce_sub_batch_size!
, which will reduce the sub batch size in 25%:
BatchedJob#sub_batch_size
will never goes lower than batch_size
BatchedJob#sub_batch_size
will be reduced 2 times - or 44% - before the cycle being reset by BatchedJob#split_and_retry!
BatchedJob#attempts
being reset to 0, the cycle will start over again.BatchedMigrationJob
to :failed
rails g post_deployment_migration AdjustSubBatchSizeOnTimeout
class AdjustSubBatchSizeOnTimeout < Gitlab::Database::Migration[2.1]
MIGRATION = 'AdjustSubBatchSizeOnTimeout'
TABLE_NAME = :issues
BATCH_COLUMN = :id
BATCH_SIZE = 500
SUB_BATCH_SIZE = 150
restrict_gitlab_migration gitlab_schema: :gitlab_main
def up
queue_batched_background_migration(
MIGRATION,
TABLE_NAME,
BATCH_COLUMN,
batch_size: BATCH_SIZE,
sub_batch_size: SUB_BATCH_SIZE,
job_interval: 2.minutes
)
end
def down
delete_batched_background_migration(MIGRATION, TABLE_NAME, BATCH_COLUMN, [])
end
end
module Gitlab
module BackgroundMigration
class AdjustSubBatchSizeOnTimeout < BatchedMigrationJob
operation_name :update_all
feature_category :database
def perform
each_sub_batch do |_|
Issue.transaction do
Issue.connection.execute 'SET statement_timeout = 10'
issue = Issue.lock.find(1)
Logger.new($stdout).info('Lock on Issue(1) for 10min.')
issue.connection.execute('SELECT * FROM pg_sleep(600);')
end
end
end
end
end
end
rails db:migrate
. On the output, check for:Caused by:
PG::QueryCanceled: ERROR: canceling statement due to statement timeout
base_model = Gitlab::Database.database_base_models[:main]
migration = Gitlab::Database::BackgroundMigration::BatchedMigration.active_migration(connection: base_model.connection)
retriable_job = migration.batched_jobs.retriable.first
retriable_job.status
=> 2 #failed
retriable_job.sub_batch_size
=> 112 # 150 - 25% = 112,5
migration_wrapper = Gitlab::Database::BackgroundMigration::BatchedMigrationWrapper.new(connection: base_model.connection)
migration_wrapper.perform(retriable_job)
retriable_job.status
=> 2 #failed
retriable_job.sub_batch_size
=> 112 # 150 - 25% = 84
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #377308 (closed)