Move a field check into the query instead of the loop body (!168004) · Merge requests · GitLab.org / GitLab

What does this MR do and why?

The with_auto_merge_enabled scope on MergeRequest adds state_id and merge_when_pipeline_succeeds to the merge requests query. This is nontrivially more efficient than loading all merge requests, opened or closed, and checking the auto_merge_enabled? attribute method.

I've copied the following query analysis directly out of !167095 (merged) where we first proposed making this change, as part of a larger MR.

Why didn't we just leave these changes in that other MR?

This is a straightforward change we can ship immediately, with no operational considerations, right now. It is not completely clear to me how much benefit there is in doing this, but academically speaking it will be a benefit of some amount. We should still continue with !167095 (merged) to leverage our read-replicas, but there is no reason to not ship this specific improvement first.

Query Changes!

By moving auto_merge_enabled? out of the loop and adding logic to the query, we change the query sent to the database and so change the execution plan:

  SELECT "merge_requests".* FROM "merge_requests"
    WHERE "merge_requests"."source_project_id" = 278964 AND "merge_requests"."source_branch" = 'set-auto-merge-process-worker-to-sticky' AND (
      EXISTS (
        SELECT 1 FROM "merge_request_diffs" INNER JOIN "merge_request_diff_commits" ON "merge_request_diff_commits"."merge_request_diff_id" = "merge_request_diffs"."id"
          WHERE (merge_requests.latest_merge_request_diff_id = merge_request_diffs.id) AND "merge_request_diff_commits"."sha" = '604b2722cfa685354d03ce43aca49f77c3cfb37c'
      )
    )
+ AND "merge_requests"."state_id" = 1
+ AND "merge_requests"."merge_when_pipeline_succeeds" = TRUE

Before: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/32112/commands/99182

After: https://console.postgres.ai/gitlab/gitlab-production-main/sessions/32112/commands/99183

My read of this is that it's good. My understanding of these plans is that the conceptual design of the query doesn't really change; the difference is that adding state = opened to the query changes our base merge_requests query plan to use the smaller partial index idx_merge_requests_on_source_project_and_branch_state_opened instead of the current full-table index index_merge_requests_on_source_project_id_and_source_branch.

Since these are both index scans, I would imagine the scan performance doesn't change very much, but the net result is that we ultimately return a smaller subset of records to join the rest of the query onto. That seems like a win to me.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

How to set up and validate locally

No behavioral changes!

Edited Oct 03, 2024 by drew stachon

Move a field check into the query instead of the loop body

What does this MR do and why?

Why didn't we just leave these changes in that other MR?

Query Changes!

MR acceptance checklist

How to set up and validate locally

Merge request reports