Skip to content

Optimize query for fetching CI pipelines for a merge request

When an MR with a large number of commits is opened, the number of pipelines is calculated for the latest 10,000 commits. In order to do that an SQL query is performed to check how many of ci_pipelines belong to these 10,000 commits via source_sha or sha:

   (81.4ms)  SELECT COUNT(*) FROM ((SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 35 AND
 ("ci_pipelines"."config_source" IN (1, 2) OR "ci_pipelines"."config_source" IS NULL) AND "ci_pipelines"."source" = 10 AND
 "ci_pipelines"."merge_request_id" = 817 AND "ci_pipelines"."source_sha" IN ('\x691d88b71d51786983b823207d876cee7c93f5d4',
 '\x06eb3d091d6f7fdba6559dc323098a1882a4acb4', '\x884c2fe2f77871558fb3e9404ab0c2f074e3a62e', 
'\x7c91a466db66725ae1d1facee3954f9424deea27', '\x90a188d85decedb847f13e51d3333ba64a5d09bb', 
'\x9599936cb2b62724668d3c708d346c7c44894442', '\x037096ef6c2c2f804302efb429bcd1eb5123b5e9', 
'\x97076881516edd0523841e679f8252928dedc167', '\x9246fd5707c18553ab2904ba2c37ed4ddfe1b3a9', 
'\x739f50c3b571d3566f9f21769357010f0989b19c', '\x34f5eb1b93b5c1e7d8ed8d578d8b94cd33d2dca3', 
'\xfeb571a1ddf41826b339bf5dd1984561622998bb', '\xc04b3d61e8487617346718150d6392c33bd0b1e7', 
'\x746f54787799ee5ea8595a8730d363bfd250ffab', '\x4230b4e4d38b19a2506f5266a81dc41db8ac1165', 
'\x0d9fb8d409a2e0d92d5e6b7c338a500ab9229a14', '\xbcd2458076512ad80c6e470d9434618f27dfec3c', 
'\xf74bd44d2a9171c51ec7c9a622d7628044791d9e', '\xddf2dcf7fdad69135cee590307773b255d2fe0b5', 
'\xb3b7d2c166a032ca11551e06664439d83f2caf59', '\x7e87990ecd8d1c1ac0a7a32661552a44c95b89d0', 
'\x558f41ddb4cae44f386790b6f9fbefb757656b6e', '\x67b0c419be85639f246154c96f131a2c1b0622c5', 
... etc

We should reconsider the logic of all_pipelines method, because currently it just performs SELECT IN query and passes an array as a param (which can contain 10,000 elements)

What about using a subquery instead?

   (2.7ms)  SELECT COUNT(*) FROM ((SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 35 AND ("ci_pipelines"."config_source" IN (1, 2) OR "ci_pipelines"."config_source" IS NULL) AND "ci_pipelines"."source" = 10 AND "ci_pipelines"."merge_request_id" = 817 AND "ci_pipelines"."source_sha" IN (SELECT  "merge_request_diff_commits"."sha" FROM "merge_request_diff_commits" WHERE "merge_request_diff_commits"."merge_request_diff_id" IN (SELECT  "merge_request_diffs"."id" FROM "merge_request_diffs" WHERE "merge_request_diffs"."merge_request_id" = 817 ORDER BY "merge_request_diffs"."id" DESC LIMIT 100) LIMIT 10000))
UNION ALL
(SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 35 AND ("ci_pipelines"."config_source" IN (1, 2) OR "ci_pipelines"."config_source" IS NULL) AND "ci_pipelines"."source" = 10 AND "ci_pipelines"."merge_request_id" = 817 AND "ci_pipelines"."sha" IN (SELECT  encode(sha, 'hex') FROM "merge_request_diff_commits" WHERE "merge_request_diff_commits"."merge_request_diff_id" IN (SELECT  "merge_request_diffs"."id" FROM "merge_request_diffs" WHERE "merge_request_diffs"."merge_request_id" = 817 ORDER BY "merge_request_diffs"."id" DESC LIMIT 100) LIMIT 10000))
UNION ALL
(SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 35 AND ("ci_pipelines"."config_source" IN (1, 2) OR "ci_pipelines"."config_source" IS NULL) AND ("ci_pipelines"."source" IN (1, 2, 3, 4, 5, 6, 7, 8, 11, 9) OR "ci_pipelines"."source" IS NULL) AND "ci_pipelines"."ref" = 'master' AND "ci_pipelines"."tag" = FALSE AND "ci_pipelines"."sha" IN (SELECT  encode(sha, 'hex') FROM "merge_request_diff_commits" WHERE "merge_request_diff_commits"."merge_request_diff_id" IN (SELECT  "merge_request_diffs"."id" FROM "merge_request_diffs" WHERE "merge_request_diffs"."merge_request_id" = 817 ORDER BY "merge_request_diffs"."id" DESC LIMIT 100) LIMIT 10000))) ci_pipelines

Related issue: #30507 (closed)

Edited by Igor Drozdov