Skip to content

Add nullify job for orphan runner_id columns of ci_builds

Furkan Ayhan requested to merge 30159-fix-data-on-ci-builds-runner-id into master

What does this MR do and why?

When a runner gets deleted, jobs that were run with it still have runner_id. This MR adds a background migration job to nullify old orphan runner_id columns of ci_builds.

There will be 3 steps (https://docs.gitlab.com/ee/development/database/add_foreign_key_to_existing_column.html);

  1. Add a NOT VALID foreign key constraint to the column to ensure GitLab doesn't create inconsistent records. <-- !80203 (merged)
  2. Add a data migration, to fix or clean up existing records. <-- You're here
  3. Validate the whole table by making the foreign key VALID. <-- will be in 14.10

This is the 2nd step.

Related comment: #30159 (comment 836146520)

Database

Up

== 20220223112304 ScheduleNullifyOrphanRunnerIdOnCiBuilds: migrating ==========
== 20220223112304 ScheduleNullifyOrphanRunnerIdOnCiBuilds: migrated (0.0513s) =

Down

== 20220223112304 ScheduleNullifyOrphanRunnerIdOnCiBuilds: reverting ==========
== 20220223112304 ScheduleNullifyOrphanRunnerIdOnCiBuilds: reverted (0.0000s) =

Update Query

UPDATE "ci_builds"
SET "runner_id" = NULL, "lock_version" = COALESCE("lock_version", 0) + 1
WHERE "ci_builds"."id" IN (
  SELECT "ci_builds"."id"
  FROM "ci_builds"
  LEFT OUTER JOIN ci_runners ON ci_runners.id = ci_builds.runner_id
  WHERE (ci_builds.runner_id IS NOT NULL AND ci_runners.id IS NULL)
    AND "ci_builds"."id" BETWEEN 1 AND 1000
)

Total Time

  • We have ~2B ci_builds rows.
  • With a 100k batch size, we'll have 20k batches.
  • (20k * 2 min interval) / 60 min = ~666 hours.
  • 666 hours / 24 = ~27 days.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #30159 (closed)

Edited by Furkan Ayhan

Merge request reports