Skip to content

Introduce service to fix duplicate CI pipeline iids across partitions

What does this MR do and why?

Context

From https://gitlab.com/gitlab-org/gitlab/-/issues/545167, we know that there is a workflow that sometimes causes duplicate pipeline IIDs to persist across different partitions (for the given project scope). Essentially what happens is that the InternalId record's last_value gets into an invalid state (last_value is reset to 0 or another value), which causes new pipelines to be created with IIDs starting from the wrong value.

We haven't pinpointed the exact root cause because it may be due to a specific race condition on project import which is difficult to reproduce (see https://gitlab.com/gitlab-org/gitlab/-/issues/545167#note_2802709422). As such, we are treating the symptom instead.

Previously, we tried fixing the problem by adding model-level validation in !208178 (closed); however we decided to pivot to this MR's approach to avoid adding pressure to the primary DB node during a Pipeline create transaction.

This MR

This MR is the first of two parts. It introduces a service class that fixes existing duplicates for a given project and iid.

In the next MR, this service will be called by a worker that subscribes to the PipelineCreatedEvent. The idea is that it will fix duplicates as they arise. Local validation steps will be provided in the next MR.

Note: Flushing the InternalID is what ultimately prevents duplicates going forward. So we don't expect that this service needs to fix existing duplicates very often.

References

Database query plans

Query that finds duplicates

      Ci::Pipeline
        .select(:id, :partition_id, :project_id, :iid)
        .for_project(project_id)
        .for_iid(iid)
        .order_id_asc
SELECT
    "p_ci_pipelines"."id", "p_ci_pipelines"."partition_id", "p_ci_pipelines"."project_id", "p_ci_pipelines"."iid" 
FROM
    "p_ci_pipelines"
WHERE
    "p_ci_pipelines"."project_id" = 278964
    AND "p_ci_pipelines"."iid" = 4766913
ORDER BY
    "p_ci_pipelines"."id" ASC;

Query plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/45031/commands/138221

  • This query unavoidably has to scan every partition. Fortunately it's supported by the index on (project_id, iid, partition_id).
  • There can only exist 1 duplicate-iid pipeline per partition, so we don't expect this query to fetch more than a few rows at most. In the majority of cases, it only fetches 2 records.

Update query

        Ci::Pipeline
          .in_partition(pipeline.partition_id)
          .id_in(pipeline.id)
          .update_all(iid: pipeline.iid)
UPDATE
    "p_ci_pipelines"
SET
    "iid" = 1234567,
    "lock_version" = COALESCE("lock_version", 0) + 1
WHERE
    "p_ci_pipelines"."partition_id" = 107
    AND "p_ci_pipelines"."id" = 2115253424;

Query plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/45031/commands/138219

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Leaminn Ma

Merge request reports

Loading