Introduce service to fix duplicate CI pipeline iids across partitions
What does this MR do and why?
Context
From https://gitlab.com/gitlab-org/gitlab/-/issues/545167, we know that there is a workflow that sometimes causes duplicate pipeline IIDs to persist across different partitions (for the given project scope). Essentially what happens is that the InternalId record's last_value gets into an invalid state (last_value is reset to 0 or another value), which causes new pipelines to be created with IIDs starting from the wrong value.
We haven't pinpointed the exact root cause because it may be due to a specific race condition on project import which is difficult to reproduce (see https://gitlab.com/gitlab-org/gitlab/-/issues/545167#note_2802709422). As such, we are treating the symptom instead.
Previously, we tried fixing the problem by adding model-level validation in !208178 (closed); however we decided to pivot to this MR's approach to avoid adding pressure to the primary DB node during a Pipeline create transaction.
This MR
This MR is the first of two parts. It introduces a service class that fixes existing duplicates for a given project and iid.
In the next MR, this service will be called by a worker that subscribes to the PipelineCreatedEvent. The idea is that it will fix duplicates as they arise. Local validation steps will be provided in the next MR.
Note: Flushing the InternalID is what ultimately prevents duplicates going forward. So we don't expect that this service needs to fix existing duplicates very often.
References
- Resolves part of: https://gitlab.com/gitlab-org/gitlab/-/issues/545167+
Database query plans
Query that finds duplicates
Ci::Pipeline
.select(:id, :partition_id, :project_id, :iid)
.for_project(project_id)
.for_iid(iid)
.order_id_asc
SELECT
"p_ci_pipelines"."id", "p_ci_pipelines"."partition_id", "p_ci_pipelines"."project_id", "p_ci_pipelines"."iid"
FROM
"p_ci_pipelines"
WHERE
"p_ci_pipelines"."project_id" = 278964
AND "p_ci_pipelines"."iid" = 4766913
ORDER BY
"p_ci_pipelines"."id" ASC;
Query plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/45031/commands/138221
- This query unavoidably has to scan every partition. Fortunately it's supported by the index on
(project_id, iid, partition_id). - There can only exist 1 duplicate-iid pipeline per partition, so we don't expect this query to fetch more than a few rows at most. In the majority of cases, it only fetches 2 records.
Update query
Ci::Pipeline
.in_partition(pipeline.partition_id)
.id_in(pipeline.id)
.update_all(iid: pipeline.iid)
UPDATE
"p_ci_pipelines"
SET
"iid" = 1234567,
"lock_version" = COALESCE("lock_version", 0) + 1
WHERE
"p_ci_pipelines"."partition_id" = 107
AND "p_ci_pipelines"."id" = 2115253424;
Query plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/45031/commands/138219
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.