Improve performance for Ci::UnlockPipelineService
What does this MR do and why?
Use CTE query to improve the performance for large size pipeline. We are seeing timeout in Ci:: (see the internal log)
All of these projects have pipelines that contains more than 150k external builds, and no matter if partition_id is included in the query or not, the Postgresql decides to use the primary key index because of the ORDER BY "p_ci_builds"."id" ASC added by each_batch.
This merge implements the CTE to force the query to use btree index starting with (commit_id, type) (e.g. p_ci_builds_commit_id_type_ref_idx).
Before
This query timed out in prod because the pipeline contained around 160k builds.
It took more than 2 hour and still hasn't finished on postgres.ai.
Click to expand the query
SELECT
"p_ci_builds"."id"
FROM
"p_ci_builds"
WHERE
"p_ci_builds"."type" = 'Ci::Build'
AND "p_ci_builds"."commit_id" = 1821178317
ORDER BY
"p_ci_builds"."id" ASC
LIMIT
1
Limit (cost=2.49..57853.39 rows=1 width=8)
-> Merge Append (cost=2.49..9772520628.12 rows=168926 width=8)
Sort Key: p_ci_builds.id
-> Index Scan using ci_builds_pkey on ci_builds p_ci_builds_1 (cost=0.58..6240720866.59 rows=1267 width=8)
Filter: (((type)::text = 'Ci::Build'::text) AND (commit_id = 1821178317))
-> Index Scan using ci_builds_101_pkey on ci_builds_101 p_ci_builds_2 (cost=0.58..1184144873.20 rows=284 width=8)
Filter: (((type)::text = 'Ci::Build'::text) AND (commit_id = 1821178317))
-> Index Scan using ci_builds_102_pkey on ci_builds_102 p_ci_builds_3 (cost=0.58..2146245904.04 rows=546 width=8)
Filter: (((type)::text = 'Ci::Build'::text) AND (commit_id = 1821178317))
-> Index Scan using ci_builds_103_pkey on ci_builds_103 p_ci_builds_4 (cost=0.57..201406175.29 rows=166828 width=8)
Filter: (((type)::text = 'Ci::Build'::text) AND (commit_id = 1821178317))
-> Index Scan using ci_builds_104_pkey on ci_builds_104 p_ci_builds_5 (cost=0.12..3.15 rows=1 width=8)
Filter: (((type)::text = 'Ci::Build'::text) AND (commit_id = 1821178317))
PID | QUERY | STATE | BACKEND TYPE | WAIT EVENT | WAIT EVENT TYPE | QUERY DURATION | STATE CHANGED AGO
------+--------------------------------+--------+----------------+--------------+-----------------+-----------------+--------------------
763 | EXPLAIN (ANALYZE, COSTS, | active | client backend | DataFileRead | IO | 02:21:28.301227 | 02:21:28.301228
| VERBOSE, BUFFERS, FORMAT JSON | | | | | |
| , SETTINGS TRUE, WAL) SELECT | | | | | |
| "p_ci_builds".... | | | | | |
After
-
For the above same pipeline, the queries generated by
each_batchare:-
First query to determine the lower id
Click to expand the query
WITH "cte_builds" AS MATERIALIZED ( SELECT "p_ci_builds"."id" FROM "p_ci_builds" WHERE "p_ci_builds"."type" = 'Ci::Build' AND "p_ci_builds"."commit_id" = 1821178317 ) SELECT "p_ci_builds"."id" FROM "cte_builds" AS "p_ci_builds" ORDER BY "p_ci_builds"."id" ASC LIMIT 1;see query plan at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/39569/commands/121903
-
Second query to determine the upper id
Click to expand the query
WITH "cte_builds" AS MATERIALIZED ( SELECT "p_ci_builds"."id" FROM "p_ci_builds" WHERE "p_ci_builds"."type" = 'Ci::Build' AND "p_ci_builds"."commit_id" = 1821178317 ) SELECT "p_ci_builds"."id" FROM "cte_builds" AS "p_ci_builds" WHERE "p_ci_builds"."id" >= 9827961769 ORDER BY "p_ci_builds"."id" ASC LIMIT 1 OFFSET 100;see query plan at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/39569/commands/121904
-
Third query to pluck the ids
Click to expand the query
WITH "cte_builds" AS MATERIALIZED ( SELECT "p_ci_builds"."id" FROM "p_ci_builds" WHERE "p_ci_builds"."type" = 'Ci::Build' AND "p_ci_builds"."commit_id" = 1821178317 ) SELECT "p_ci_builds"."id" FROM "cte_builds" AS "p_ci_builds" WHERE "p_ci_builds"."id" >= 9827961769see query plan at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/39569/commands/121905
-
-
For the pipeline that has 10k regular builds, the queries generated by
each_batchare:-
First query to determine the lower id
Click to expand the query
WITH "cte_builds" AS MATERIALIZED ( SELECT "p_ci_builds"."id" FROM "p_ci_builds" WHERE "p_ci_builds"."type" = 'Ci::Build' AND "p_ci_builds"."commit_id" = 22117708 ) SELECT "p_ci_builds"."id" FROM "cte_builds" AS "p_ci_builds" ORDER BY "p_ci_builds"."id" ASC LIMIT 1;see query plan at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/39569/commands/121906
-
Second query to determine the upper id
Click to expand the query
WITH "cte_builds" AS MATERIALIZED ( SELECT "p_ci_builds"."id" FROM "p_ci_builds" WHERE "p_ci_builds"."type" = 'Ci::Build' AND "p_ci_builds"."commit_id" = 22117708 ) SELECT "p_ci_builds"."id" FROM "cte_builds" AS "p_ci_builds" WHERE "p_ci_builds"."id" >= 68607823 ORDER BY "p_ci_builds"."id" ASC LIMIT 1 OFFSET 100;see query plan at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/39569/commands/121907
-
Third query to pluck the ids
Click to expand the query
WITH "cte_builds" AS MATERIALIZED ( SELECT "p_ci_builds"."id" FROM "p_ci_builds" WHERE "p_ci_builds"."type" = 'Ci::Build' AND "p_ci_builds"."commit_id" = 22117708 ) SELECT "p_ci_builds"."id" FROM "cte_builds" AS "p_ci_builds" WHERE "p_ci_builds"."id" >= 68607823 AND "p_ci_builds"."id" < 68607989see query plan at https://postgres.ai/console/gitlab/gitlab-production-ci/sessions/39569/commands/121908
-
References
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #435737 (closed)