Add foreign key constraint to partitions of ci_runner_machines_687967fa8a

note: 🚨🚨 Before merging this MR, we should verify that !172422 (merged) has completed the migration in workflowpost-deploy-db-production and that we no longer have orphan runner managers in the partitioned table. 🚨🚨

What does this MR do and why?

This MR recreates a foreign key constraint in ci_runner_machines_687967fa8a (follow-up issue for non-.com), which was removed following an S2 incident on .com. The incident happened because the FK was added before !166520 (merged) had been merged to complete the backfill of the target ci_runners_e59bb2812d table. The backfill in production completed on Wed 6th, so we can now add back the FK. We can't verify the FK at the same time, since even with the following integrity guarantees:

  • a fk_rails_666b61f04f constraint in ci_runner_machines which makes sure that records point to a valid ci_runners record.
  • a table_sync_trigger_bc3e7b56bd trigger in ci_runner_machines which syncs any changes from ci_runner_machines to the new ci_runner_machines_687967fa8a partitioned table.

there are 265 runner managers referencing 260 missing runners (from the new ci_runner_machines_687967fa8a table to ci_runners_e59bb2812d), even though those runners are present on ci_runners:

image

This MR also adds logic to ensure a copy of the runner exists in the sharded table whenever a runner manager is created, to avoid a relapse of the incident.

Changelog: added

References

Please include cross links to any resources that are relevant to this MR This will give reviewers and future readers helpful context to give an efficient review of the changes introduced.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

image

Registering a runner manager on an existing runner

  1. taking a newly created runner (never contacted):

    image

  2. ensuring the runner does not exist on the partitioned table:

    image

  3. calling the POST /runners/verify endpoint on it:

    image

  4. confirm that the runner manager appears correctly on GitLab:

    image

  5. the partitioned table runner was created:

    image

Updating an existing runner manager that doesn't have its runner on ci_runners_e59bb2812d

  1. we'll start by delete the runner from the partitioned table, to emulate an online runner which doesn't have it's record on the partitioned table, but gets contacted and needs to update the ci_runner_machines table:

    image

  2. the ci_runner_machines.contacted_at field gets updated, but there's no equivalent record is created in ci_runner_machines_687967fa8a (as I believe the sync trigger only updates records if they exist) and therefore, the FK constraint is not checked. As expected, a corresponding ci_runners_e59bb2812d doesn't get created:

    image image

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

n/a

Edited by Pedro Pombeiro

Merge request reports

Loading