Introduce delta_max for background migrations
What does this MR do?
This is one of the solutions for background migrations cause statement timeouts if there is a huge gap in a batch.
This MR does the following things.
- If the delta of min ID and max ID is very big, we split it by
BACKGROUND_MIGRATION_DELTA_MAX
- The argument
delta_max
can be controlled per background migrations.
Here is a step-by-step guide on how to perform background migrations successfully with this MR.
-
Check the distribution level of target rows (Example). If some deltas exceed 1 ~ 2 million, you should be careful that a sidekiq job which is assigned by the batch would take a long time, and in the worst case, it fails by statement timeouts, and the batch will never be migrated.
-
Calculate the total execution time of background migrations. Here is the script to check it.
deltas = [377590, 171130, 103575, ...]
sidekiq_job_count = 0
deltas.each do |delta|
sidekiq_job_count += (delta / BACKGROUND_MIGRATION_DELTA_MAX).ceil
end
puts "Your background migrations take #{sidekiq_job_count * 5} minutes"
\3. Make sure your queries in background migrations are performant enough with BACKGROUND_MIGRATION_DELTA_MAX
. For exmaple, if your base query is Ci::Build.where("artifacts_file <> ''")
, then Ci::Build.where("artifacts_file <> ''").where(id: 10000..510000)
should be fast enough not to fail by statements timeout(15sec).
Are there points in the code the reviewer needs to double check?
Why was this MR needed?
Screenshots (if relevant)
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added -
Tests added for this feature/bug - Conform by the code review guidelines
-
Has been reviewed by a Backend maintainer -
Has been reviewed by a Database specialist
-
-
End-to-end tests pass ( package-and-qa
manual pipeline job)