Skip to content

Introduce delta_max for background migrations

Shinya Maeda requested to merge background-migration-delta-splitter into master

What does this MR do?

This is one of the solutions for background migrations cause statement timeouts if there is a huge gap in a batch.

This MR does the following things.

  1. If the delta of min ID and max ID is very big, we split it by BACKGROUND_MIGRATION_DELTA_MAX
  2. The argument delta_max can be controlled per background migrations.

Here is a step-by-step guide on how to perform background migrations successfully with this MR.

  1. Check the distribution level of target rows (Example). If some deltas exceed 1 ~ 2 million, you should be careful that a sidekiq job which is assigned by the batch would take a long time, and in the worst case, it fails by statement timeouts, and the batch will never be migrated.

  2. Calculate the total execution time of background migrations. Here is the script to check it.

deltas = [377590, 171130, 103575, ...]

sidekiq_job_count = 0

deltas.each do |delta|
  sidekiq_job_count += (delta / BACKGROUND_MIGRATION_DELTA_MAX).ceil
end

puts "Your background migrations take #{sidekiq_job_count * 5} minutes"

\3. Make sure your queries in background migrations are performant enough with BACKGROUND_MIGRATION_DELTA_MAX. For exmaple, if your base query is Ci::Build.where("artifacts_file <> ''"), then Ci::Build.where("artifacts_file <> ''").where(id: 10000..510000) should be fast enough not to fail by statements timeout(15sec).

Are there points in the code the reviewer needs to double check?

Why was this MR needed?

Screenshots (if relevant)

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/48226

Edited by Shinya Maeda

Merge request reports