High database query timings when canceling pending pipelines
This is coming from https://gitlab.com/gitlab-com/gl-infra/infrastructure/-/issues/12326#note_484146016, where a problematic query has been identified. The database group looked into this and this is a summary:
On GitLab.com, we observe high query timings related to canceling pending pipelines. During high traffic times, we can see query timings at about 400ms, but it can go up as high as 600ms.
In Thanos, we can see both the high query timings and their frequency (going up to 10/s):
SELECT "ci_pipelines".*
FROM "ci_pipelines"
WHERE "ci_pipelines"."project_id" = 1
AND ("ci_pipelines"."source" IN (1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12)
OR "ci_pipelines"."source" IS NULL)
AND "ci_pipelines"."ref" = 'some-ref'
AND "ci_pipelines"."id" NOT IN (
WITH RECURSIVE "base_and_descendants" AS (
(SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."id" = 35)
UNION
(SELECT "ci_pipelines".* FROM "ci_pipelines", "base_and_descendants", "ci_sources_pipelines"
WHERE "ci_sources_pipelines"."pipeline_id" = "ci_pipelines"."id"
AND "ci_sources_pipelines"."source_pipeline_id" = "base_and_descendants"."id"
AND "ci_sources_pipelines"."source_project_id" = "ci_sources_pipelines"."project_id")
)
SELECT "id" FROM "base_and_descendants" AS "ci_pipelines")
AND "ci_pipelines"."sha" != 'abc123'
AND ("ci_pipelines"."status" IN ('created','waiting_for_resource','preparing','pending','running','scheduled'))
AND (
NOT EXISTS (
SELECT "ci_builds".*
FROM "ci_builds"
INNER JOIN "ci_builds_metadata"
ON "ci_builds_metadata"."build_id" = "ci_builds"."id"
WHERE "ci_builds"."type" = 'Ci::Build'
AND (ci_builds.commit_id = ci_pipelines.id)
AND ("ci_builds"."status" IN ('running','success','failed'))
AND (ci_builds_metadata.id NOT IN (
SELECT "ci_builds_metadata"."id"
FROM "ci_builds_metadata"
WHERE (ci_builds_metadata.build_id = ci_builds.id)
AND "ci_builds_metadata"."interruptible" = TRUE)
)
)
)
ORDER BY "ci_pipelines"."id" ASC
LIMIT 1000
A query plan can be found here: https://explain.depesz.com/s/WREh#l15
We found that the following query seems to originate from here in CancelPendingPipelines
, which is called from Ci::CreatePipelineService
.
We have not identified a straight-forward solution for this. Please see #131 (comment 503034740) for discussion of changes we can apply to the data model to help this query.