Skip to content

Makes BackfillWorkItemTypeIdOnIssues migration more efficient

Mario Celi requested to merge 338004-next-batch-not-null-strategy into master

What does this MR do and why?

By adding a new batching strategy, the migration will avoid scanning the entire issues table completely one type per issue type (5 times currently). If no records exist for a given issue type, no background jobs will be run.

Modifying an existing migration, so this change will only have effect on self hosted instances when 14.10 is released. We might consider updating the strategy class for the currently running migration on .com

Can't use let_it_be for batching strategy specs

Related to #338004 (closed)

DB Review

Not adding a work_item_type_id IS NULL condition on the batch bound queries as that would effectively need a filter on the index scans. Worst case scenario if all rows don't have a null value, the entire table would be filtered (we cannot add a not null condition to the temp index to allow hot updates). Batched migration will create batches even if the batch will produce no updates, but this will happen very few times as most of the records do have a null work_item_type_id.

By adding this change, we would no longer scan the entire issues table 5 times (once per each work item type). If a type has no issues, no batch will be scheduled.

Change Batch Class Migration

This migration will replace the batching strategy in environments where the batched migrations were already scheduled like Gitlab.com

I think it's work doing even if it only affects .com as it will save days (maybe even a week) in run time for issue types that should end in single batch instead of having to walk through the entire issues table.

UP

== 20220404194649 ReplaceWorkItemTypeBackfillNextBatchStrategy: migrating =====
== 20220404194649 ReplaceWorkItemTypeBackfillNextBatchStrategy: migrated (0.0135s)

DOWN

== 20220404194649 ReplaceWorkItemTypeBackfillNextBatchStrategy: reverting =====
== 20220404194649 ReplaceWorkItemTypeBackfillNextBatchStrategy: reverted (0.0103s)

Batch limit queries

For issue type

SELECT "issues"."id" FROM "issues" WHERE ("id" >= 1) AND "issues"."issue_type" = 0 ORDER BY "issues"."id" ASC LIMIT 1;

https://console.postgres.ai/shared/40c33d7b-988c-46e1-8ede-0ca46f7fb05f


SELECT "issues"."id" FROM "issues" WHERE ("id" >= 1) AND "issues"."issue_type" = 0 AND "issues"."id" >= 4 ORDER BY "issues"."id" ASC LIMIT 1 OFFSET 10000;

https://console.postgres.ai/shared/b9b57a11-da7f-493f-be62-450c8e09ee1c


SELECT MIN("id"), MAX("id") FROM "issues" WHERE ("id" >= 1) AND "issues"."issue_type" = 0 AND "issues"."id" >= 4 AND "issues"."id" < 14276;

https://console.postgres.ai/shared/24d0eb80-c349-4534-93b4-6a337e94c6c3

For incident type

SELECT "issues"."id" FROM "issues" WHERE ("id" >= 1) AND "issues"."issue_type" = 1 ORDER BY "issues"."id" ASC LIMIT 1;

https://console.postgres.ai/shared/f01983ae-7473-45e7-8153-84d524934540


SELECT "issues"."id" FROM "issues" WHERE ("id" >= 1) AND "issues"."issue_type" = 1 AND "issues"."id" >= 2949893 ORDER BY "issues"."id" ASC LIMIT 1 OFFSET 10000;

https://console.postgres.ai/shared/3cbd15aa-4d5d-4994-a7f8-d1d2c6b13d75


SELECT MIN("id"), MAX("id") FROM "issues" WHERE ("id" >= 1) AND "issues"."issue_type" = 1 AND "issues"."id" >= 2949893 AND "issues"."id" < 71647041;

https://console.postgres.ai/shared/1df6ab9f-2d3c-429c-bfdf-187aec566d4c

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Mario Celi

Merge request reports