Skip to content

Add BG migration to delete orphaned deployments

What does this MR do?

In the previous MR, we've added the FK constraint to stop creating orphaned deployments (FYI, orphaned deployments are the rows in the deployments table that have a value in environment_id column, but the corresponding environments row doesn't exist). This MR is the second step to delete the orphaned deployments. Due to the large number of targets, we'll use background migration. Here is the estimation:

  • Statistics
    • 6,250,683 rows to delete (Checked on db-lab, 22nd June 2021)
    • Total 163,863,216 rows in deployments table.
  • Estimated time for scheduling background migration jobs.
    • Step 1) 6.647 ms for calculating the beginning of the batches.
    • Step 2) 94.117 ms for finding a next batch.
    • Step 3) 1.028 ms for finding the start-id and end-id of a batch.
    • batch size = 100,000
    • Total batch count = 163,863,216 / 100,000 = 1,639 jobs
    • 1,639 * (Step 2 + Step 3) = 155,942 ms = 2.6 minutes to schedule jobs on post-deployment migration.
  • Estimated times per batch:
    • 1.905 s for a delete statement
    • 2 minutes delay per batch.
    • 1,639 batches * 2 min per batch = 2.27 days to finish executing all the scheduled jobs in Sidekiq.

Related #26229 (closed)

Queries for scheduling

1. Calculating the beginning of the batches

SELECT "deployments"."id" FROM "deployments" ORDER BY "deployments"."id" ASC LIMIT 1

Time: 6.647 ms
  - planning: 0.352 ms
  - execution: 6.295 ms
    - I/O read: 6.109 ms
    - I/O write: 0.000 ms

Plan: https://explain.depesz.com/s/kTcq

2. Finding a next batch

explain SELECT "deployments"."id" FROM "deployments" WHERE "deployments"."id" >= 30000000 ORDER BY "deployments"."id" ASC LIMIT 1 OFFSET 100000

Time: 94.117 ms
  - planning: 0.361 ms
  - execution: 93.756 ms
    - I/O read: 49.549 ms
    - I/O write: 0.000 ms

Plan: https://explain.depesz.com/s/QG55

3. Finding the start-id and end-id of a batch

SELECT MIN("deployments".id), MAX("deployments".id) FROM "deployments" WHERE "deployments"."id" >= 30000000 AND "deployments"."id" < 30100000

Time: 1.028 ms
  - planning: 0.648 ms
  - execution: 0.380 ms
    - I/O read: 0.064 ms
    - I/O write: 0.000 ms

Plan: https://explain.depesz.com/s/S60w

(NOTE: Step 2. and 3. are looped until it's scanned all rows)

Queries per batch

DELETE FROM "deployments" WHERE (NOT EXISTS (SELECT 1 FROM environments WHERE deployments.environment_id = environments.id)) AND "deployments"."id" BETWEEN 40000000 AND 40100000

Time: 1.905 s
  - planning: 0.573 ms
  - execution: 1.904 s
    - I/O read: 1.215 s
    - I/O write: 0.000 ms

Plan: https://explain.depesz.com/s/MP8W

Migration test locally

shinya@shinya-B550-VISION-D:~/workspace/thin-gdk/services/rails/src$ tre bin/rails db:migrate:redo VERSION=20210622141148
INFO: This script is a predefined script in devkitkat.
== 20210622141148 ScheduleDeleteOrphanedDeployments: reverting ================
== 20210622141148 ScheduleDeleteOrphanedDeployments: reverted (0.0016s) =======

== 20210622141148 ScheduleDeleteOrphanedDeployments: migrating ================
== 20210622141148 ScheduleDeleteOrphanedDeployments: migrated (0.0102s) =======

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Shinya Maeda

Merge request reports