Skip to content

Fix query performance in PipelinesFinder

Krasimir Angelov requested to merge 37745-optimize-pipelines-finder into master

What does this MR do?

This MR updates PipelinesFinder add additional ci_pipelines.project_id = ? clause that change the execution to use index_ci_pipelines_on_project_id_and_id_desc instead of ci_pipelines_pkey which speeds up the query a lot.

This discussion has more details - !21092 (comment 271877475).

Before: https://gitlab.slack.com/archives/CLJMDRD8C/p1578533549033400

Time: 23.592 min
  - planning: 3.290 ms
  - execution: 23.592 min
    - I/O read: 22.850 min
    - I/O write: 2.233 s

Shared buffers:
  - hits: 20985209 (~160.10 GiB) from the buffer pool
  - reads: 1749670 (~13.30 GiB) from the OS file cache, including disk I/O
  - dirtied: 80949 (~632.40 MiB)
  - writes: 13739 (~107.30 MiB)
Execution plan (visual)
Limit  (cost=33.70..40.44 rows=20 width=300) (actual time=1415521.507..1415521.513 rows=1 loops=1)
   Buffers: shared hit=20985209 read=1749670 dirtied=80949 written=13739
   I/O Timings: read=1370983.931 write=2232.629
   ->  Merge Join  (cost=33.70..16884209.35 rows=50132882 width=300) (actual time=1415521.470..1415521.474 rows=1 loops=1)
         Buffers: shared hit=20985209 read=1749670 dirtied=80949 written=13739
         I/O Timings: read=1370983.931 write=2232.629
         ->  Index Scan using ci_pipelines_pkey on public.ci_pipelines  (cost=0.57..16633508.53 rows=100265765 width=300) (actual time=4.558..1412506.541 rows=22909553 loops=1)
               Buffers: shared hit=20985193 read=1749662 dirtied=80949 written=13739
               I/O Timings: read=1370970.369 write=2232.629
         ->  Sort  (cost=33.13..33.68 rows=218 width=4) (actual time=13.805..13.806 rows=1 loops=1)
               Sort Key: latest_ids.id DESC
               Sort Method: quicksort  Memory: 25kB
               Buffers: shared hit=16 read=8
               I/O Timings: read=13.562
               ->  Subquery Scan  (cost=20.31..24.67 rows=218 width=4) (actual time=13.727..13.727 rows=1 loops=1)
                     Buffers: shared hit=16 read=8
                     I/O Timings: read=13.562
                     ->  HashAggregate  (cost=20.31..22.49 rows=218 width=17) (actual time=13.725..13.725 rows=1 loops=1)
                           Group Key: ci_pipelines_1.ref
                           Buffers: shared hit=16 read=8
                           I/O Timings: read=13.562
                           ->  Index Only Scan using index_ci_pipelines_on_project_idandrefandiddesc on public.ci_pipelines ci_pipelines_1  (cost=0.57..19.22 rows=218 width=17) (actual time=11.005..13.701 rows=25 loops=1)
                                 Index Cond: ((ci_pipelines_1.project_id = 14074169) AND (ci_pipelines_1.ref = ANY ('{aaa,bbb,ccc,ddd,master}'::text[])))
                                 Heap Fetches: 0
                                 Buffers: shared hit=16 read=8
                                 I/O Timings: read=13.562

After: https://gitlab.slack.com/archives/CLJMDRD8C/p1580261341451800

Time: 28.085 ms
  - planning: 3.579 ms
  - execution: 24.506 ms
    - I/O read: 23.764 ms
    - I/O write: 0.000 ms

Shared buffers:
  - hits: 16 (~128.00 KiB) from the buffer pool
  - reads: 15 (~120.00 KiB) from the OS file cache, including disk I/O
  - dirtied: 0
  - writes: 0
Execution plan (visual)
Limit  (cost=43.23..104.37 rows=20 width=299) (actual time=22.942..24.311 rows=1 loops=1)
   Buffers: shared hit=16 read=15
   I/O Timings: read=23.764
   ->  Merge Join  (cost=43.23..746.31 rows=230 width=299) (actual time=22.908..24.276 rows=1 loops=1)
         Buffers: shared hit=16 read=15
         I/O Timings: read=23.764
         ->  Index Scan using index_ci_pipelines_on_project_id_and_id_desc on public.ci_pipelines  (cost=0.57..701.36 rows=459 width=299) (actual time=9.420..11.983 rows=3 loops=1)
               Index Cond: (ci_pipelines.project_id = 14074169)
               Buffers: shared read=7
               I/O Timings: read=11.814
         ->  Sort  (cost=42.67..43.23 rows=226 width=4) (actual time=12.257..12.258 rows=1 loops=1)
               Sort Key: latest_ids.id DESC
               Sort Method: quicksort  Memory: 25kB
               Buffers: shared hit=16 read=8
               I/O Timings: read=11.950
               ->  Subquery Scan  (cost=29.31..33.83 rows=226 width=4) (actual time=12.146..12.146 rows=1 loops=1)
                     Buffers: shared hit=16 read=8
                     I/O Timings: read=11.950
                     ->  HashAggregate  (cost=29.31..31.57 rows=226 width=17) (actual time=12.145..12.145 rows=1 loops=1)
                           Group Key: ci_pipelines_1.ref
                           Buffers: shared hit=16 read=8
                           I/O Timings: read=11.950
                           ->  Index Only Scan using index_ci_pipelines_on_project_idandrefandiddesc on public.ci_pipelines ci_pipelines_1  (cost=0.57..28.18 rows=226 width=17) (actual time=8.607..12.118 rows=25 loops=1)
                                 Index Cond: ((ci_pipelines_1.project_id = 14074169) AND (ci_pipelines_1.ref = ANY ('{aaa,bbb,ccc,ddd,master}'::}'::text[])))
                                 Heap Fetches: 0
                                 Buffers: shared hit=16 read=8
                                 I/O Timings: read=11.950

Screenshots

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team

Related to #37745

Edited by Krasimir Angelov

Merge request reports