Possible Index suggestion or query rewriting
During our investigations on statements that are creating a cpu utilization of 70% the database,
@gerardo.herzig mentioned in the issue: gitlab-com/gl-infra&365 (comment 460166554) , the following suggestion:
We have spotted a query fairly common to see ranked in PgBadger reports:
SELECT
"ci_pipelines"."id"
FROM
"ci_pipelines"
WHERE
"ci_pipelines"."ci_ref_id" = ?
AND ("ci_pipelines"."source" IN (...)
OR "ci_pipelines"."source" IS NULL)
AND ("ci_pipelines"."status" IN (...))
ORDER BY
"ci_pipelines"."id" DESC
LIMIT ?;
Execution plan:
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..4532.35 rows=1 width=4) (actual time=1597.319..1597.319 rows=1 loops=1)
-> Index Scan Backward using ci_pipelines_pkey on ci_pipelines (cost=0.57..31255699.54 rows=6897 width=4) (actual time=1597.317..1597.317 rows=1 loops=1)
Filter: ((ci_ref_id = 2439091) AND ((status)::text = ANY ('{success,failed,canceled}'::text[])) AND ((source = ANY ('{1,2,3,4,5,6,7,8,10,11}'::integer[])) OR (source IS NULL)))
Rows Removed by Filter: 626373
Planning Time: 0.455 ms
Execution Time: 1597.361 ms
(6 rows)
And it is called quite frequently (see this metric)
In particular,
Rows Removed by Filter: 626373
Tells us that a better index should be used. Now I was expecting that this query should be using
"index_ci_pipelines_on_ci_ref_id" btree (ci_ref_id) WHERE ci_ref_id IS NOT NULL
-as it does in database-lab
-, but it prefers to use the ci_pipelines_pkey
to make order faster. It might require some rewriting or adding a new index, but it certanly a query to verify.