Skip to content

Optimize the database query for pipelines for MRs

What does this MR do?

This MR improves the performance of the query for fetching pipelines for merge requests. Specifically,

  • Pipelines for merge requests doesn't need to be filtered by SHAs. There is an explicit relationship with a merge request through merge_request_id FK, so we can fetch the list of MRs by the id.
  • We fetch both pipelines for merged results and pipelines for merge requests in one query. As it's ORDER BY pipelines.id anyway, it doesn't make any difference.
  • Removing redundant source filtering.

This feature change is behind ci_pipelines_for_merge_request_finder_new_cte feature flag and disabled by default. We can easily fallback to the previous behavior when there is an edge case on the new query.

Before

  WITH "shas" AS (
      SELECT
          "merge_request_diff_commits"."sha"
      FROM
          "merge_request_diff_commits"
      WHERE
          "merge_request_diff_commits"."merge_request_diff_id" IN (
              SELECT
                  "merge_request_diffs"."id"
              FROM
                  "merge_request_diffs"
              WHERE
                  "merge_request_diffs"."merge_request_id" = 46282578
              ORDER BY
                  "merge_request_diffs"."id" DESC
              LIMIT
                  100
          )
      LIMIT
          10000
  )
  SELECT
      "ci_pipelines".*
  FROM
      (
          (
              SELECT
                  "ci_pipelines".*
              FROM
                  "ci_pipelines"
                  INNER JOIN "shas" ON "shas"."sha" = "ci_pipelines"."source_sha"
              WHERE
                  (
                      "ci_pipelines"."source" IN (1, 2, 3, 4, 5, 6, 7, 8, 10, 11)
                      OR "ci_pipelines"."source" IS NULL
                  )
                  AND "ci_pipelines"."source" = 10
                  AND "ci_pipelines"."merge_request_id" = 46282578
                  AND "ci_pipelines"."project_id" IN (14278368, 278964)
          )
          UNION
              (
                  SELECT
                      "ci_pipelines".*
                  FROM
                      "ci_pipelines"
                      INNER JOIN "shas" ON encode("shas"."sha", 'hex') = "ci_pipelines"."sha"
                  WHERE
                      (
                          "ci_pipelines"."source" IN (1, 2, 3, 4, 5, 6, 7, 8, 10, 11)
                          OR "ci_pipelines"."source" IS NULL
                      )
                      AND "ci_pipelines"."source" = 10
                      AND "ci_pipelines"."merge_request_id" = 46282578
                      AND "ci_pipelines"."project_id" IN (14278368, 278964)
              )
          UNION
              (
                  SELECT
                      "ci_pipelines".*
                  FROM
                      "ci_pipelines"
                      INNER JOIN "shas" ON encode("shas"."sha", 'hex') = "ci_pipelines"."sha"
                  WHERE
                      "ci_pipelines"."project_id" = 14278368
                      AND (
                          "ci_pipelines"."source" IN (1, 2, 3, 4, 5, 6, 7, 8, 10, 11)
                          OR "ci_pipelines"."source" IS NULL
                      )
                      AND (
                          "ci_pipelines"."source" IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12, 13)
                          OR "ci_pipelines"."source" IS NULL
                      )
                      AND "ci_pipelines"."ref" = '25486-batch-suggestions'
                      AND "ci_pipelines"."tag" = FALSE
              )
      ) ci_pipelines
  ORDER BY
      CASE ci_pipelines.source WHEN (10) THEN 0 ELSE 1 END,
      ci_pipelines.id DESC

Performance

Time: 13.019 ms
  - planning: 4.729 ms
  - execution: 8.290 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 889 (~6.90 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
 Sort  (cost=8024.05..8024.06 rows=3 width=1788) (actual time=7.767..7.788 rows=91 loops=1)
   Sort Key: (CASE ci_pipelines.source WHEN 10 THEN 0 ELSE 1 END), ci_pipelines.id DESC
   Sort Method: quicksort  Memory: 49kB
   Buffers: shared hit=889
   CTE shas
     ->  Limit  (cost=4.51..7270.79 rows=10000 width=21) (actual time=0.230..3.912 rows=1435 loops=1)
           Buffers: shared hit=605
           ->  Nested Loop  (cost=4.51..143905.30 rows=198039 width=21) (actual time=0.229..3.672 rows=1435 loops=1)
                 Buffers: shared hit=605
                 ->  HashAggregate  (cost=3.93..4.05 rows=12 width=4) (actual time=0.202..0.257 rows=92 loops=1)
                       Group Key: merge_request_diffs.id
                       Buffers: shared hit=91
                       ->  Limit  (cost=0.57..3.78 rows=12 width=4) (actual time=0.041..0.159 rows=92 loops=1)
                             Buffers: shared hit=91
                             ->  Index Only Scan using index_merge_request_diffs_on_merge_request_id_and_id on public.merge_request_diffs  (cost=0.57..3.78 rows=12 width=4) (actual time=0.041..0.145 rows=92 loops=1)
                                   Index Cond: (merge_request_diffs.merge_request_id = 46282578)
                                   Heap Fetches: 5
                                   Buffers: shared hit=91
                 ->  Index Scan using merge_request_diff_commits_pkey on public.merge_request_diff_commits  (cost=0.58..11826.74 rows=16503 width=25) (actual time=0.022..0.032 rows=16 loops=92)
                       Index Cond: (merge_request_diff_commits.merge_request_diff_id = merge_request_diffs.id)
                       Buffers: shared hit=514
   ->  Subquery Scan on ci_pipelines  (cost=753.16..753.23 rows=3 width=1788) (actual time=7.396..7.653 rows=91 loops=1)
         Buffers: shared hit=889
         ->  HashAggregate  (cost=753.16..753.19 rows=3 width=1784) (actual time=7.390..7.603 rows=91 loops=1)
               Group Key: ci_pipelines_1.id, ci_pipelines_1.ref, ci_pipelines_1.sha, ci_pipelines_1.before_sha, ci_pipelines_1.created_at, ci_pipelines_1.updated_at, ci_pipelines_1.tag, ci_pipelines_1.yaml_errors, ci_pipelines_1.committed_at, ci_pipelines_1.project_id, ci_pipelines_1.status, ci_pipelines_1.started_at, ci_pipelines_1.finished_at, ci_pipelines_1.duration, ci_pipelines_1.user_id, ci_pipelines_1.lock_version, ci_pipelines_1.auto_canceled_by_id, ci_pipelines_1.pipeline_schedule_id, ci_pipelines_1.source, ci_pipelines_1.protected, ci_pipelines_1.config_source, ci_pipelines_1.failure_reason, ci_pipelines_1.iid, ci_pipelines_1.merge_request_id, ci_pipelines_1.source_sha, ci_pipelines_1.target_sha, ci_pipelines_1.external_pull_request_id, ci_pipelines_1.ci_ref_id, ci_pipelines_1.locked
               Buffers: shared hit=889
               ->  Append  (cost=5.87..752.95 rows=3 width=1784) (actual time=0.824..6.705 rows=154 loops=1)
                     Buffers: shared hit=889
                     ->  Hash Join  (cost=5.87..243.38 rows=1 width=318) (actual time=0.542..0.545 rows=0 loops=1)
                           Hash Cond: (shas.sha = ci_pipelines_1.source_sha)
                           Buffers: shared hit=190
                           ->  CTE Scan on shas  (cost=0.00..200.00 rows=10000 width=32) (actual time=0.231..0.232 rows=1 loops=1)
                                 Buffers: shared hit=96
                           ->  Hash  (cost=5.86..5.86 rows=1 width=318) (actual time=0.291..0.293 rows=0 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                 Buffers: shared hit=94
                                 ->  Index Scan using index_ci_pipelines_on_merge_request_id on public.ci_pipelines ci_pipelines_1  (cost=0.43..5.86 rows=1 width=318) (actual time=0.025..0.282 rows=91 loops=1)
                                       Index Cond: (ci_pipelines_1.merge_request_id = 46282578)
                                       Filter: ((ci_pipelines_1.project_id = ANY ('{14278368,278964}'::integer[])) AND (ci_pipelines_1.source = 10) AND ((ci_pipelines_1.source = ANY ('{1,2,3,4,5,6,7,8,10,11}'::integer[])) OR (ci_pipelines_1.source IS NULL)))
                                       Rows Removed by Filter: 0
                                       Buffers: shared hit=94
                     ->  Hash Join  (cost=5.87..255.88 rows=1 width=318) (actual time=0.280..5.737 rows=154 loops=1)
                           Hash Cond: (encode(shas_1.sha, 'hex'::text) = (ci_pipelines_2.sha)::text)
                           Buffers: shared hit=603
                           ->  CTE Scan on shas shas_1  (cost=0.00..200.00 rows=10000 width=32) (actual time=0.001..4.442 rows=1435 loops=1)
                                 Buffers: shared hit=509
                           ->  Hash  (cost=5.86..5.86 rows=1 width=318) (actual time=0.242..0.243 rows=91 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 29kB
                                 Buffers: shared hit=94
                                 ->  Index Scan using index_ci_pipelines_on_merge_request_id on public.ci_pipelines ci_pipelines_2  (cost=0.43..5.86 rows=1 width=318) (actual time=0.019..0.158 rows=91 loops=1)
                                       Index Cond: (ci_pipelines_2.merge_request_id = 46282578)
                                       Filter: ((ci_pipelines_2.project_id = ANY ('{14278368,278964}'::integer[])) AND (ci_pipelines_2.source = 10) AND ((ci_pipelines_2.source = ANY ('{1,2,3,4,5,6,7,8,10,11}'::integer[])) OR (ci_pipelines_2.source IS NULL)))
                                       Rows Removed by Filter: 0
                                       Buffers: shared hit=94
                     ->  Hash Join  (cost=3.63..253.64 rows=1 width=318) (actual time=0.369..0.370 rows=0 loops=1)
                           Hash Cond: (encode(shas_2.sha, 'hex'::text) = (ci_pipelines_3.sha)::text)
                           Buffers: shared hit=96
                           ->  CTE Scan on shas shas_2  (cost=0.00..200.00 rows=10000 width=32) (actual time=0.001..0.001 rows=1 loops=1)
                           ->  Hash  (cost=3.62..3.62 rows=1 width=318) (actual time=0.314..0.315 rows=0 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                 Buffers: shared hit=96
                                 ->  Index Scan using index_ci_pipelines_on_project_idandrefandiddesc on public.ci_pipelines ci_pipelines_3  (cost=0.57..3.62 rows=1 width=318) (actual time=0.313..0.313 rows=0 loops=1)
                                       Index Cond: ((ci_pipelines_3.project_id = 14278368) AND ((ci_pipelines_3.ref)::text = '25486-batch-suggestions'::text))
                                       Filter: ((NOT ci_pipelines_3.tag) AND ((ci_pipelines_3.source = ANY ('{1,2,3,4,5,6,7,8,10,11}'::integer[])) OR (ci_pipelines_3.source IS NULL)) AND ((ci_pipelines_3.source = ANY ('{1,2,3,4,5,6,7,8,9,11,12,13}'::integer[])) OR (ci_pipelines_3.source IS NULL)))
                                       Rows Removed by Filter: 91
                                       Buffers: shared hit=96

After

Query

WITH "shas" AS (
    SELECT
        "merge_request_diff_commits"."sha"
    FROM
        "merge_request_diff_commits"
    WHERE
        "merge_request_diff_commits"."merge_request_diff_id" IN (
            SELECT
                "merge_request_diffs"."id"
            FROM
                "merge_request_diffs"
            WHERE
                "merge_request_diffs"."merge_request_id" = 46282578
            ORDER BY
                "merge_request_diffs"."id" DESC
            LIMIT
                100
        )
    LIMIT
        10000
)
SELECT
    "ci_pipelines".*
FROM
    (
        (
            SELECT
                "ci_pipelines".*
            FROM
                "ci_pipelines"
            WHERE
                "ci_pipelines"."source" = 10
                AND "ci_pipelines"."merge_request_id" = 46282578
                AND "ci_pipelines"."project_id" IN (14278368, 278964)
        )
        UNION
            (
                SELECT
                    "ci_pipelines".*
                FROM
                    "ci_pipelines"
                    INNER JOIN "shas" ON encode("shas"."sha", 'hex') = "ci_pipelines"."sha"
                WHERE
                    "ci_pipelines"."project_id" = 14278368
                    AND (
                        "ci_pipelines"."source" IN (1, 2, 3, 4, 5, 6, 7, 8, 11)
                        OR "ci_pipelines"."source" IS NULL
                    )
                    AND "ci_pipelines"."ref" = '25486-batch-suggestions'
                    AND "ci_pipelines"."tag" = FALSE
            )
    ) ci_pipelines
ORDER BY
    CASE ci_pipelines.source WHEN (10) THEN 0 ELSE 1 END,
    ci_pipelines.id DESC

Performance

Time: 5.589 ms
  - planning: 2.491 ms
  - execution: 3.098 ms
    - I/O read: 0.000 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 309 (~2.40 MiB) from the buffer pool
  - reads: 0 from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
 Sort  (cost=7530.31..7530.32 rows=2 width=1788) (actual time=2.674..2.692 rows=91 loops=1)
   Sort Key: (CASE ci_pipelines.source WHEN 10 THEN 0 ELSE 1 END), ci_pipelines.id DESC
   Sort Method: quicksort  Memory: 49kB
   Buffers: shared hit=309
   CTE shas
     ->  Limit  (cost=4.51..7270.79 rows=10000 width=21) (actual time=0.836..0.838 rows=1 loops=1)
           Buffers: shared hit=96
           ->  Nested Loop  (cost=4.51..143905.30 rows=198039 width=21) (actual time=0.834..0.837 rows=1 loops=1)
                 Buffers: shared hit=96
                 ->  HashAggregate  (cost=3.93..4.05 rows=12 width=4) (actual time=0.784..0.786 rows=1 loops=1)
                       Group Key: merge_request_diffs.id
                       Buffers: shared hit=91
                       ->  Limit  (cost=0.57..3.78 rows=12 width=4) (actual time=0.116..0.715 rows=92 loops=1)
                             Buffers: shared hit=91
                             ->  Index Only Scan using index_merge_request_diffs_on_merge_request_id_and_id on public.merge_request_diffs  (cost=0.57..3.78 rows=12 width=4) (actual time=0.115..0.695 rows=92 loops=1)
                                   Index Cond: (merge_request_diffs.merge_request_id = 46282578)
                                   Heap Fetches: 5
                                   Buffers: shared hit=91
                 ->  Index Scan using merge_request_diff_commits_pkey on public.merge_request_diff_commits  (cost=0.58..11826.74 rows=16503 width=25) (actual time=0.044..0.045 rows=1 loops=1)
                       Index Cond: (merge_request_diff_commits.merge_request_diff_id = merge_request_diffs.id)
                       Buffers: shared hit=5
   ->  Subquery Scan on ci_pipelines  (cost=259.33..259.51 rows=2 width=1788) (actual time=2.319..2.470 rows=91 loops=1)
         Buffers: shared hit=306
         ->  Unique  (cost=259.33..259.48 rows=2 width=1784) (actual time=2.314..2.402 rows=91 loops=1)
               Buffers: shared hit=306
               ->  Sort  (cost=259.33..259.34 rows=2 width=1784) (actual time=2.312..2.325 rows=91 loops=1)
                     Sort Key: ci_pipelines_1.id, ci_pipelines_1.ref, ci_pipelines_1.sha, ci_pipelines_1.before_sha, ci_pipelines_1.created_at, ci_pipelines_1.updated_at, ci_pipelines_1.tag, ci_pipelines_1.yaml_errors, ci_pipelines_1.committed_at, ci_pipelines_1.project_id, ci_pipelines_1.status, ci_pipelines_1.started_at, ci_pipelines_1.finished_at, ci_pipelines_1.duration, ci_pipelines_1.user_id, ci_pipelines_1.lock_version, ci_pipelines_1.auto_canceled_by_id, ci_pipelines_1.pipeline_schedule_id, ci_pipelines_1.source, ci_pipelines_1.protected, ci_pipelines_1.config_source, ci_pipelines_1.failure_reason, ci_pipelines_1.iid, ci_pipelines_1.merge_request_id, ci_pipelines_1.source_sha, ci_pipelines_1.target_sha, ci_pipelines_1.external_pull_request_id, ci_pipelines_1.ci_ref_id, ci_pipelines_1.locked
                     Sort Method: quicksort  Memory: 49kB
                     Buffers: shared hit=306
                     ->  Append  (cost=0.43..259.32 rows=2 width=1784) (actual time=0.026..2.067 rows=91 loops=1)
                           Buffers: shared hit=286
                           ->  Index Scan using index_ci_pipelines_on_merge_request_id on public.ci_pipelines ci_pipelines_1  (cost=0.43..5.67 rows=1 width=318) (actual time=0.025..0.933 rows=91 loops=1)
                                 Index Cond: (ci_pipelines_1.merge_request_id = 46282578)
                                 Filter: ((ci_pipelines_1.project_id = ANY ('{14278368,278964}'::integer[])) AND (ci_pipelines_1.source = 10))
                                 Rows Removed by Filter: 0
                                 Buffers: shared hit=94
                           ->  Hash Join  (cost=3.61..253.62 rows=1 width=318) (actual time=1.109..1.110 rows=0 loops=1)
                                 Hash Cond: (encode(shas.sha, 'hex'::text) = (ci_pipelines_2.sha)::text)
                                 Buffers: shared hit=192
                                 ->  CTE Scan on shas  (cost=0.00..200.00 rows=10000 width=32) (actual time=0.839..0.839 rows=1 loops=1)
                                       Buffers: shared hit=96
                                 ->  Hash  (cost=3.60..3.60 rows=1 width=318) (actual time=0.242..0.243 rows=0 loops=1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                       Buffers: shared hit=96
                                       ->  Index Scan using index_ci_pipelines_on_project_idandrefandiddesc on public.ci_pipelines ci_pipelines_2  (cost=0.57..3.60 rows=1 width=318) (actual time=0.242..0.242 rows=0 loops=1)
                                             Index Cond: ((ci_pipelines_2.project_id = 14278368) AND ((ci_pipelines_2.ref)::text = '25486-batch-suggestions'::text))
                                             Filter: ((NOT ci_pipelines_2.tag) AND ((ci_pipelines_2.source = ANY ('{1,2,3,4,5,6,7,8,11}'::integer[])) OR (ci_pipelines_2.source IS NULL)))
                                             Rows Removed by Filter: 91
                                             Buffers: shared hit=96

Screenshots (strongly suggested)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by Mayra Cabrera

Merge request reports