Optimize query for fetching CI pipelines for a merge request
When an MR with a large number of commits is opened, the number of pipelines is calculated for the latest 10,000 commits. In order to do that an SQL query is performed to check how many of ci_pipelines belong to these 10,000 commits via source_sha or sha:
(81.4ms) SELECT COUNT(*) FROM ((SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 35 AND
("ci_pipelines"."config_source" IN (1, 2) OR "ci_pipelines"."config_source" IS NULL) AND "ci_pipelines"."source" = 10 AND
"ci_pipelines"."merge_request_id" = 817 AND "ci_pipelines"."source_sha" IN ('\x691d88b71d51786983b823207d876cee7c93f5d4',
'\x06eb3d091d6f7fdba6559dc323098a1882a4acb4', '\x884c2fe2f77871558fb3e9404ab0c2f074e3a62e',
'\x7c91a466db66725ae1d1facee3954f9424deea27', '\x90a188d85decedb847f13e51d3333ba64a5d09bb',
'\x9599936cb2b62724668d3c708d346c7c44894442', '\x037096ef6c2c2f804302efb429bcd1eb5123b5e9',
'\x97076881516edd0523841e679f8252928dedc167', '\x9246fd5707c18553ab2904ba2c37ed4ddfe1b3a9',
'\x739f50c3b571d3566f9f21769357010f0989b19c', '\x34f5eb1b93b5c1e7d8ed8d578d8b94cd33d2dca3',
'\xfeb571a1ddf41826b339bf5dd1984561622998bb', '\xc04b3d61e8487617346718150d6392c33bd0b1e7',
'\x746f54787799ee5ea8595a8730d363bfd250ffab', '\x4230b4e4d38b19a2506f5266a81dc41db8ac1165',
'\x0d9fb8d409a2e0d92d5e6b7c338a500ab9229a14', '\xbcd2458076512ad80c6e470d9434618f27dfec3c',
'\xf74bd44d2a9171c51ec7c9a622d7628044791d9e', '\xddf2dcf7fdad69135cee590307773b255d2fe0b5',
'\xb3b7d2c166a032ca11551e06664439d83f2caf59', '\x7e87990ecd8d1c1ac0a7a32661552a44c95b89d0',
'\x558f41ddb4cae44f386790b6f9fbefb757656b6e', '\x67b0c419be85639f246154c96f131a2c1b0622c5',
... etc
We should reconsider the logic of all_pipelines method, because currently it just performs SELECT IN query and passes an array as a param (which can contain 10,000 elements)
What about using a subquery instead?
(2.7ms) SELECT COUNT(*) FROM ((SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 35 AND ("ci_pipelines"."config_source" IN (1, 2) OR "ci_pipelines"."config_source" IS NULL) AND "ci_pipelines"."source" = 10 AND "ci_pipelines"."merge_request_id" = 817 AND "ci_pipelines"."source_sha" IN (SELECT "merge_request_diff_commits"."sha" FROM "merge_request_diff_commits" WHERE "merge_request_diff_commits"."merge_request_diff_id" IN (SELECT "merge_request_diffs"."id" FROM "merge_request_diffs" WHERE "merge_request_diffs"."merge_request_id" = 817 ORDER BY "merge_request_diffs"."id" DESC LIMIT 100) LIMIT 10000))
UNION ALL
(SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 35 AND ("ci_pipelines"."config_source" IN (1, 2) OR "ci_pipelines"."config_source" IS NULL) AND "ci_pipelines"."source" = 10 AND "ci_pipelines"."merge_request_id" = 817 AND "ci_pipelines"."sha" IN (SELECT encode(sha, 'hex') FROM "merge_request_diff_commits" WHERE "merge_request_diff_commits"."merge_request_diff_id" IN (SELECT "merge_request_diffs"."id" FROM "merge_request_diffs" WHERE "merge_request_diffs"."merge_request_id" = 817 ORDER BY "merge_request_diffs"."id" DESC LIMIT 100) LIMIT 10000))
UNION ALL
(SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 35 AND ("ci_pipelines"."config_source" IN (1, 2) OR "ci_pipelines"."config_source" IS NULL) AND ("ci_pipelines"."source" IN (1, 2, 3, 4, 5, 6, 7, 8, 11, 9) OR "ci_pipelines"."source" IS NULL) AND "ci_pipelines"."ref" = 'master' AND "ci_pipelines"."tag" = FALSE AND "ci_pipelines"."sha" IN (SELECT encode(sha, 'hex') FROM "merge_request_diff_commits" WHERE "merge_request_diff_commits"."merge_request_diff_id" IN (SELECT "merge_request_diffs"."id" FROM "merge_request_diffs" WHERE "merge_request_diffs"."merge_request_id" = 817 ORDER BY "merge_request_diffs"."id" DESC LIMIT 100) LIMIT 10000))) ci_pipelines
Related issue: #30507 (closed)
Edited by Igor Drozdov