Skip to content

Backfill cleanup schedules for old closed/merged MRs

Patrick Bajao requested to merge 245263-mr-refs-cleanup-old-mrs into master

What does this MR do?

Adds background migration that will create records corresponding records in merge_request_cleanup_schedules for merge requests that are closed/merged.

These records will be later on queried by a cron worker so they merge request refs of those MRs will be cleaned up accordingly.

Estimate

60888857 to insert batch_size = 10000 60888857 / 10000 = 6089 loops

Estimated times per batch:

  • 9s for select statement with 10000 items
  • 370ms for delete statement with 10000 items Total: ~10sec per batch

2 mins delay per loop (safe for the given total time per batch)

6089 * (120+10) = 791,570s or ~9 days

Migration

== 20201103110018 ScheduleMergeRequestCleanupSchedulesBackfill: migrating =====
-- transaction_open?()
   -> 0.0000s
-- index_exists?(:merge_requests, :id, {:name=>"merge_requests_state_id_temp_index", :where=>"state_id IN (2, 3)", :algorithm=>:concurrently})
   -> 0.0121s
-- execute("SET statement_timeout TO 0")
   -> 0.0002s
-- add_index(:merge_requests, :id, {:name=>"merge_requests_state_id_temp_index", :where=>"state_id IN (2, 3)", :algorithm=>:concurrently})
   -> 0.0200s
-- execute("RESET ALL")
   -> 0.0003s
== 20201103110018 ScheduleMergeRequestCleanupSchedulesBackfill: migrated (0.0671s) 

Query Plans

All queries tested on #database-lab.

For querying records to enqueue background migration jobs

Query used for enqueueing background migration jobs uses each_batch internally.

each_batch start

Sample:

SELECT "merge_requests"."id"
FROM "merge_requests"
WHERE merge_requests.state_id IN (2, 3)
ORDER BY "merge_requests"."id" ASC LIMIT 1

Plan: https://explain.depesz.com/s/VBho

each_batch stop

Sample:

SELECT "merge_requests"."id"
FROM "merge_requests"
WHERE merge_requests.state_id IN (2, 3) AND "merge_requests"."id" >= 1
ORDER BY "merge_requests"."id" ASC LIMIT 1 OFFSET 10000

Plan: https://explain.depesz.com/s/1SkT

each_batch min/max

Sample:

SELECT MIN(id), MAX(id)
FROM "merge_requests"
WHERE merge_requests.state_id IN (2, 3) AND "merge_requests"."id" >= 1 AND "merge_requests"."id" < 10001

Plan: https://explain.depesz.com/s/sEFw

Insert query

Inserts 10K records per batch.

Sample:

INSERT INTO merge_request_cleanup_schedules (merge_request_id, scheduled_at, created_at, updated_at)
SELECT merge_requests.id, COALESCE(metrics.merged_at, COALESCE(metrics.latest_closed_at, merge_requests.updated_at)) + interval ‘14 days’, NOW(), NOW()
FROM “merge_requests”
LEFT JOIN merge_request_metrics metrics ON metrics.merge_request_id = merge_requests.id
WHERE (merge_requests.state_id IN (2, 3)) AND “merge_requests”.“id” BETWEEN 1 AND 10000
ON CONFLICT (merge_request_id) DO NOTHING;

Plan: https://explain.depesz.com/s/xg7k

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • [-] Label as security and @ mention @gitlab-com/gl-security/appsec
  • [-] The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • [-] Security reports checked/validated by a reviewer from the AppSec team

Related to #245263 (closed)

Edited by Patrick Bajao

Merge request reports