Optimize query for fetching CI pipelines for a merge request
When an MR with a large number of commits is opened, the number of pipelines is calculated for the latest 10,000
commits. In order to do that an SQL query is performed to check how many of ci_pipelines
belong to these 10,000
commits via source_sha
or sha
:
(81.4ms) SELECT COUNT(*) FROM ((SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 35 AND
("ci_pipelines"."config_source" IN (1, 2) OR "ci_pipelines"."config_source" IS NULL) AND "ci_pipelines"."source" = 10 AND
"ci_pipelines"."merge_request_id" = 817 AND "ci_pipelines"."source_sha" IN ('\x691d88b71d51786983b823207d876cee7c93f5d4',
'\x06eb3d091d6f7fdba6559dc323098a1882a4acb4', '\x884c2fe2f77871558fb3e9404ab0c2f074e3a62e',
'\x7c91a466db66725ae1d1facee3954f9424deea27', '\x90a188d85decedb847f13e51d3333ba64a5d09bb',
'\x9599936cb2b62724668d3c708d346c7c44894442', '\x037096ef6c2c2f804302efb429bcd1eb5123b5e9',
'\x97076881516edd0523841e679f8252928dedc167', '\x9246fd5707c18553ab2904ba2c37ed4ddfe1b3a9',
'\x739f50c3b571d3566f9f21769357010f0989b19c', '\x34f5eb1b93b5c1e7d8ed8d578d8b94cd33d2dca3',
'\xfeb571a1ddf41826b339bf5dd1984561622998bb', '\xc04b3d61e8487617346718150d6392c33bd0b1e7',
'\x746f54787799ee5ea8595a8730d363bfd250ffab', '\x4230b4e4d38b19a2506f5266a81dc41db8ac1165',
'\x0d9fb8d409a2e0d92d5e6b7c338a500ab9229a14', '\xbcd2458076512ad80c6e470d9434618f27dfec3c',
'\xf74bd44d2a9171c51ec7c9a622d7628044791d9e', '\xddf2dcf7fdad69135cee590307773b255d2fe0b5',
'\xb3b7d2c166a032ca11551e06664439d83f2caf59', '\x7e87990ecd8d1c1ac0a7a32661552a44c95b89d0',
'\x558f41ddb4cae44f386790b6f9fbefb757656b6e', '\x67b0c419be85639f246154c96f131a2c1b0622c5',
... etc
We should reconsider the logic of all_pipelines
method, because currently it just performs SELECT IN
query and passes an array as a param (which can contain 10,000
elements)
What about using a subquery instead?
(2.7ms) SELECT COUNT(*) FROM ((SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 35 AND ("ci_pipelines"."config_source" IN (1, 2) OR "ci_pipelines"."config_source" IS NULL) AND "ci_pipelines"."source" = 10 AND "ci_pipelines"."merge_request_id" = 817 AND "ci_pipelines"."source_sha" IN (SELECT "merge_request_diff_commits"."sha" FROM "merge_request_diff_commits" WHERE "merge_request_diff_commits"."merge_request_diff_id" IN (SELECT "merge_request_diffs"."id" FROM "merge_request_diffs" WHERE "merge_request_diffs"."merge_request_id" = 817 ORDER BY "merge_request_diffs"."id" DESC LIMIT 100) LIMIT 10000))
UNION ALL
(SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 35 AND ("ci_pipelines"."config_source" IN (1, 2) OR "ci_pipelines"."config_source" IS NULL) AND "ci_pipelines"."source" = 10 AND "ci_pipelines"."merge_request_id" = 817 AND "ci_pipelines"."sha" IN (SELECT encode(sha, 'hex') FROM "merge_request_diff_commits" WHERE "merge_request_diff_commits"."merge_request_diff_id" IN (SELECT "merge_request_diffs"."id" FROM "merge_request_diffs" WHERE "merge_request_diffs"."merge_request_id" = 817 ORDER BY "merge_request_diffs"."id" DESC LIMIT 100) LIMIT 10000))
UNION ALL
(SELECT "ci_pipelines".* FROM "ci_pipelines" WHERE "ci_pipelines"."project_id" = 35 AND ("ci_pipelines"."config_source" IN (1, 2) OR "ci_pipelines"."config_source" IS NULL) AND ("ci_pipelines"."source" IN (1, 2, 3, 4, 5, 6, 7, 8, 11, 9) OR "ci_pipelines"."source" IS NULL) AND "ci_pipelines"."ref" = 'master' AND "ci_pipelines"."tag" = FALSE AND "ci_pipelines"."sha" IN (SELECT encode(sha, 'hex') FROM "merge_request_diff_commits" WHERE "merge_request_diff_commits"."merge_request_diff_id" IN (SELECT "merge_request_diffs"."id" FROM "merge_request_diffs" WHERE "merge_request_diffs"."merge_request_id" = 817 ORDER BY "merge_request_diffs"."id" DESC LIMIT 100) LIMIT 10000))) ci_pipelines
Related issue: #30507 (closed)
Edited by Igor Drozdov