Backfill work_item_type_id for all Issues
What does this MR do and why?
Associates ALL Issue
with a WorkItem::Type
. !67638 (merged) already associates new/updated issues with the work_item_types
table.
This background migration takes care of all issues created/updated before !67638 (merged)
Troubleshooting
- DRIs - backend @mcelicalderon
@digitalmoksha
, database @stomlinson @krasio - Disable the migration (alternatives):
- Pause this specific migration (5 different might exist, each with different args) -
Gitlab::Database::BackgroundMigration::BatchedMigration.where(job_class_name: 'BackfillWorkItemTypeIdForIssues').each(&:paused!)
- Disable automatic batch size optimization -
Feature.disable(:optimize_batched_migrations)
- Disable all batched background migrations -
Feature.disable(:execute_batched_migrations_on_schedule)
- Pause this specific migration (5 different might exist, each with different args) -
- Monitoring these migrations:
- For GitLab admins, check the status and manage batched background migrations - https://docs.gitlab.com/ee/update/index.html#check-the-status-of-batched-background-migrations
- Query the database for batched background migrations on
issues
table-- Get details for batched background migration for `issues` table SELECT * FROM batched_background_migrations WHERE table_name = 'issues'\gx -- Get count of batched background migration jobs by status for `issues` table SELECT batched_background_migrations.id, batched_background_migration_jobs.status, COUNT(*) FROM batched_background_migrations JOIN batched_background_migration_jobs ON batched_background_migrations.id = batched_background_migration_jobs.batched_background_migration_id WHERE table_name = 'issues' GROUP BY batched_background_migrations.id, batched_background_migration_jobs.status; -- Batched background migration progress for `issues` table (based on estimated total number of tuples) SELECT m.table_name, LEAST(100 * sum(j.batch_size) / pg_class.reltuples, 100) AS percentage_complete FROM batched_background_migrations m JOIN batched_background_migration_jobs j ON j.batched_background_migration_id = m.id JOIN pg_class ON pg_class.relname = m.table_name WHERE j.status = 3 AND m.table_name = 'issues' GROUP BY m.id, pg_class.reltuples;
- Sidekiq logs for all batched background migrations - https://log.gprd.gitlab.net/goto/fa6a8830-aa3f-11ec-bd7b-c108343628c3
- PostgreSQL slow query logs for all batched background migrations - https://log.gprd.gitlab.net/goto/28d04750-aa40-11ec-bd7b-c108343628c3
- Grafana dashboards
- PostgreSQL Tuple Statistics - https://dashboards.gitlab.net/d/000000167/postgresql-tuple-statistics?orgId=1&refresh=1m
- PostgreSQL Overview - https://dashboards.gitlab.net/d/000000144/postgresql-overview?orgId=1
- Thanos graph for updated tupples per batched background migration - https://thanos-query.ops.gitlab.net/graph?g0.expr=sum%20(rate(batched_migration_job_updated_tuples_total%7Benv%3D%22gprd%22%7D%5B5m%5D))%20by%20(migration_id)%20&g0.tab=0&g0.stacked=0&g0.range_input=1d&g0.max_source_resolution=0s&g0.deduplicate=1&g0.partial_response=0&g0.store_matches=%5B%5D
Migration Run Output
UP
== 20220315181125 RemoveIssuesWorkItemTypeIdIndex: migrating ==================
-- transaction_open?()
-> 0.0000s
-- indexes(:issues)
-> 0.0117s
-- execute("SET statement_timeout TO 0")
-> 0.0005s
-- remove_index(:issues, {:algorithm=>:concurrently, :name=>"index_issues_on_work_item_type_id"})
-> 0.0050s
-- execute("RESET statement_timeout")
-> 0.0005s
== 20220315181125 RemoveIssuesWorkItemTypeIdIndex: migrated (0.0233s) =========
== 20220315181130 AddTemporaryIssueTypeIndexForWorkItemTypes: migrating =======
-- transaction_open?()
-> 0.0000s
-- index_exists?(:issues, [:issue_type, :id], {:name=>"tmp_index_issues_on_issue_type_and_id", :algorithm=>:concurrently})
-> 0.0086s
-- add_index(:issues, [:issue_type, :id], {:name=>"tmp_index_issues_on_issue_type_and_id", :algorithm=>:concurrently})
-> 0.0042s
== 20220315181130 AddTemporaryIssueTypeIndexForWorkItemTypes: migrated (0.0151s)
== 20220315181136 BackfillWorkItemTypeIdOnIssues: migrating ===================
== 20220315181136 BackfillWorkItemTypeIdOnIssues: migrated (0.0807s) ==========
DOWN
== 20220315181136 BackfillWorkItemTypeIdOnIssues: reverting ===================
== 20220315181136 BackfillWorkItemTypeIdOnIssues: reverted (0.0158s) ==========
== 20220315181130 AddTemporaryIssueTypeIndexForWorkItemTypes: reverting =======
-- transaction_open?()
-> 0.0000s
-- indexes(:issues)
-> 0.0105s
-- execute("SET statement_timeout TO 0")
-> 0.0006s
-- remove_index(:issues, {:algorithm=>:concurrently, :name=>"tmp_index_issues_on_issue_type_and_id"})
-> 0.0067s
-- execute("RESET statement_timeout")
-> 0.0006s
== 20220315181130 AddTemporaryIssueTypeIndexForWorkItemTypes: reverted (0.0292s)
== 20220315181125 RemoveIssuesWorkItemTypeIdIndex: reverting ==================
-- transaction_open?()
-> 0.0000s
-- index_exists?(:issues, :work_item_type_id, {:name=>"index_issues_on_work_item_type_id", :algorithm=>:concurrently})
-> 0.0073s
-- add_index(:issues, :work_item_type_id, {:name=>"index_issues_on_work_item_type_id", :algorithm=>:concurrently})
-> 0.0128s
== 20220315181125 RemoveIssuesWorkItemTypeIdIndex: reverted (0.0232s) =========
Migration runtime details
Automated information on post_migrate runtime
Issue type Background migration (10k batch, aprox 44.0344s)
Queries for the first 2/100 batches explained
SELECT "issues"."id" FROM "issues" WHERE "issues"."id" BETWEEN 4 AND 14263 AND "issues"."issue_type" = 0 AND "issues"."work_item_type_id" IS NULL ORDER BY "issues"."id" ASC LIMIT 1
https://console.postgres.ai/shared/371b3623-4ad1-4ee0-972c-6397df64b1f3
SELECT "issues"."id" FROM "issues" WHERE "issues"."id" BETWEEN 4 AND 14263 AND "issues"."issue_type" = 0 AND "issues"."work_item_type_id" IS NULL AND "issues"."id" >= 4 ORDER BY "issues"."id" ASC LIMIT 1 OFFSET 100
https://console.postgres.ai/shared/a4e412f7-1ccb-452d-99aa-d693f8b3dfd2
UPDATE "issues" SET "work_item_type_id" = 1, "lock_version" = COALESCE("lock_version", 0) + 1 WHERE "issues"."issue_type" = 0 AND "issues"."work_item_type_id" IS NULL AND "issues"."id" BETWEEN 4 AND 170
https://console.postgres.ai/shared/f8776c22-cb06-418f-8c68-7c45785926cd
SELECT "issues"."id" FROM "issues" WHERE "issues"."id" BETWEEN 4 AND 14263 AND "issues"."issue_type" = 0 AND "issues"."work_item_type_id" IS NULL AND "issues"."id" >= 170 ORDER BY "issues"."id" ASC LIMIT 1 OFFSET 100
https://console.postgres.ai/shared/e6ec7e6e-917f-4cd7-b7d1-bb2813c7603e
UPDATE "issues" SET "work_item_type_id" = 1, "lock_version" = COALESCE("lock_version", 0) + 1 WHERE "issues"."issue_type" = 0 AND "issues"."work_item_type_id" IS NULL AND "issues"."id" BETWEEN 170 AND 822
https://console.postgres.ai/shared/09154274-7df3-4ee1-9e63-bb03b4428fc0
Aproximate timings
- 880.689ms for 2 batches. For 100 batches: 44.0344s
Incident type Background migration (10k batch aprox 191s)
Queries for the first 2/100 batches explained
SELECT "issues"."id" FROM "issues" WHERE "issues"."id" BETWEEN 2949893 AND 71490263 AND "issues"."issue_type" = 1 AND "issues"."work_item_type_id" IS NULL ORDER BY "issues"."id" ASC LIMIT 1
https://console.postgres.ai/shared/d27fc0fa-0210-4216-ad25-d3b26d4fe952
SELECT "issues"."id" FROM "issues" WHERE "issues"."id" BETWEEN 2949893 AND 71490263 AND "issues"."issue_type" = 1 AND "issues"."work_item_type_id" IS NULL AND "issues"."id" >= 2949893 ORDER BY "issues"."id" ASC LIMIT 1 OFFSET 100
https://console.postgres.ai/shared/ec45aa4d-8df0-4f65-baba-85b6d3057146
UPDATE "issues" SET "work_item_type_id" = 2, "lock_version" = COALESCE("lock_version", 0) + 1 WHERE "issues"."issue_type" = 1 AND "issues"."work_item_type_id" IS NULL AND "issues"."id" BETWEEN 2949893 AND 12838156
https://console.postgres.ai/shared/22666207-81ba-4ac7-b49d-bdeb8f194980
SELECT "issues"."id" FROM "issues" WHERE "issues"."id" BETWEEN 2949893 AND 71490263 AND "issues"."issue_type" = 1 AND "issues"."work_item_type_id" IS NULL AND "issues"."id" >= 12838156 ORDER BY "issues"."id" ASC LIMIT 1 OFFSET 100
https://console.postgres.ai/shared/19d6c087-86d0-4981-9816-f05d059769f0
UPDATE "issues" SET "work_item_type_id" = 2, "lock_version" = COALESCE("lock_version", 0) + 1 WHERE "issues"."issue_type" = 1 AND "issues"."work_item_type_id" IS NULL AND "issues"."id" BETWEEN 12838157 AND 14788538
https://console.postgres.ai/shared/e7a15f2e-16e1-447e-9e14-0a5516ab43f7
Aproximate timings
- When using the temp index ~41.623ms per batch * 100 batches = ~4162.3 => ~4.162seconds
- When temp index is not used ~1.179mins per batch * 100 batches = ~117.9mins
Aproximate Total Runtime
Issue Type | # Records | # of 10k batches | Time per batch | Total Runtime (batches times interval 2min) |
---|---|---|---|---|
Issue | 81548341 | 8155 | 44s | 271h => 11.3d (8.49d if interval is 1.5 min) |
Incident | 59514 | 6 | 191s | 12min (might overlap if a job takes more than 2 min) |
Test Case | 3087 | 1 | NA | 2min |
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.
Related to #338004 (closed)