Adjust sub-batch size for failed Batched Background Migration Jobs
What does this MR do and why?
Overview
Reduces the sub_batch_size
from BatchedMigrationJob
when a timeout happens during sub batch processing.
It rescues the following exceptions:
ActiveRecord::StatementTimeout
ActiveRecord::ConnectionTimeoutError
ActiveRecord::AdapterTimeout
ActiveRecord::LockWaitTimeout
ActiveRecord::QueryCanceled
Solves #377308 (closed)
Feature Flag Issue: #393556 (closed)
Details
If a timeout happens while processing each_sub_batch, a Gitlab::Database::BackgroundMigration::SubBatchTimeoutError
error will be raised.
This error will be rescued by the migration wrapper and processed by BatchedJob#reduce_sub_batch_size!
, which will reduce the sub batch size in 25%:
-
BatchedJob#sub_batch_size
will never goes lower thanbatch_size
-
BatchedJob#sub_batch_size
will be reduced 2 times - or 44% - before the cycle being reset by BatchedJob#split_and_retry! -
- After
BatchedJob#attempts
being reset to 0, the cycle will start over again.
- After
- The cycle happens while changing the state of
BatchedMigrationJob
to:failed
How to set up and validate locally
- Create a new background migration:
rails g post_deployment_migration AdjustSubBatchSizeOnTimeout
Example
class AdjustSubBatchSizeOnTimeout < Gitlab::Database::Migration[2.1]
MIGRATION = 'AdjustSubBatchSizeOnTimeout'
TABLE_NAME = :issues
BATCH_COLUMN = :id
BATCH_SIZE = 500
SUB_BATCH_SIZE = 150
restrict_gitlab_migration gitlab_schema: :gitlab_main
def up
queue_batched_background_migration(
MIGRATION,
TABLE_NAME,
BATCH_COLUMN,
batch_size: BATCH_SIZE,
sub_batch_size: SUB_BATCH_SIZE,
job_interval: 2.minutes
)
end
def down
delete_batched_background_migration(MIGRATION, TABLE_NAME, BATCH_COLUMN, [])
end
end
- Create a new class to process the migration:
Example
module Gitlab
module BackgroundMigration
class AdjustSubBatchSizeOnTimeout < BatchedMigrationJob
operation_name :update_all
feature_category :database
def perform
each_sub_batch do |_|
Issue.transaction do
Issue.connection.execute 'SET statement_timeout = 10'
issue = Issue.lock.find(1)
Logger.new($stdout).info('Lock on Issue(1) for 10min.')
issue.connection.execute('SELECT * FROM pg_sleep(600);')
end
end
end
end
end
end
- Run
rails db:migrate
. On the output, check for:
Caused by:
PG::QueryCanceled: ERROR: canceling statement due to statement timeout
- Open the console and check for the first created retriable job and check it's sub_batch_size. Should be reduced by 25%:
base_model = Gitlab::Database.database_base_models[:main]
migration = Gitlab::Database::BackgroundMigration::BatchedMigration.active_migration(connection: base_model.connection)
retriable_job = migration.batched_jobs.retriable.first
retriable_job.status
=> 2 #failed
retriable_job.sub_batch_size
=> 112 # 150 - 25% = 112,5
- Re-try failed job
migration_wrapper = Gitlab::Database::BackgroundMigration::BatchedMigrationWrapper.new(connection: base_model.connection)
migration_wrapper.perform(retriable_job)
retriable_job.status
=> 2 #failed
retriable_job.sub_batch_size
=> 112 # 150 - 25% = 84
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #377308 (closed)
Merge request reports
Activity
changed milestone to %15.8
assigned to @l.rosa
- A deleted user
added backend databasereview pending labels
1 Warning 06e9befc: Commits that change 30 or more lines across at least 3 files should describe these changes in the commit body. For more information, take a look at our Commit message guidelines. Reviewer roulette
Changes that require review have been detected!
Please refer to the table below for assigning reviewers and maintainers suggested by Danger in the specified category:
Category Reviewer Maintainer backend Sashi Kumar Kumaresan (
@sashi_kumar
) (UTC+0, 3 hours ahead of@l.rosa
)Terri Chu (
@terrichu
) (UTC-5, 2 hours behind@l.rosa
)database Carla Drago (
@carlad-gl
) (UTC+1, 4 hours ahead of@l.rosa
)Andy Soiron (
@Andysoiron
) (UTC+1, 4 hours ahead of@l.rosa
)To spread load more evenly across eligible reviewers, Danger has picked a candidate for each review slot, based on their timezone. Feel free to override these selections if you think someone else would be better-suited or use the GitLab Review Workload Dashboard to find other available reviewers.
To read more on how to use the reviewer roulette, please take a look at the Engineering workflow and code review guidelines. Please consider assigning a reviewer or maintainer who is a domain expert in the area of the merge request.
Once you've decided who will review this merge request, assign them as a reviewer! Danger does not automatically notify them for you.
If needed, you can retry the
danger-review
job that generated this comment.Generated by
DangerEdited by Ghost User- Resolved by Leonardo da Rosa
- Resolved by Leonardo da Rosa
added 1 commit
- 3ca0c863 - Reduce sub-batch size for failed Batched Background Migration Jobs
changed milestone to %15.9
added missed:15.8 label
- Resolved by Leonardo da Rosa
Hey, @krasio!
What do you think about this approach?
- Resolved by Leonardo da Rosa
- Resolved by Leonardo da Rosa
added 1 commit
- 975964a3 - Re-raise the same exception who triggered the timeout
added 1965 commits
-
975964a3...6d1c17ea - 1957 commits from branch
master
- de4c90d8 - Reduce sub-batch size for failed Batched Background Migration Jobs
- 19f03edc - Reduce sub-batch size for failed Batched Background Migration Jobs
- 1567fdc8 - Adds a new error wrapper class to give more context to timeout exceptions
- 2ad24d72 - Adds SQL tracker class to collect sql info within `sql.active_record`
- 20ae433f - Rescue `SubBatchTimeoutException` and forward error context
- 55d2e95d - When transition to failed, determine if should update
- 20baf32a - Instrument execution of sub batches and rescue expected
- 83f8894a - Re-raise the same exception who triggered the timeout
Toggle commit list-
975964a3...6d1c17ea - 1957 commits from branch
added 4 commits
Toggle commit list