Apply partition pruning to MergeRequest#head_pipeline for pipeline lookup queries

What does this MR do and why?

Adds partition pruning to merge_request.head_pipeline lookups on p_ci_pipelines.

Problem: The belongs_to :head_pipeline association on MergeRequest queries p_ci_pipelines by id only, without a partition_id filter. This causes full cross-partition scans, leading to database lock contention

Query Plan

Before (scans 10 partitions, ~50ms):

explain SELECT "p_ci_pipelines".* FROM "p_ci_pipelines" WHERE "p_ci_pipelines"."id" = 1234 LIMIT 1;

Limit  (cost=0.58..3.75 rows=1 width=532) (actual time=50.113..50.122 rows=0 loops=1)
   Buffers: shared hit=3 read=36
   ->  Append  (cost=0.58..32.31 rows=10 width=532) (actual time=50.110..50.119 rows=0 loops=1)
         Buffers: shared hit=3 read=36
         ->  Index Scan using ci_pipelines_pkey on gitlab_partitions_dynamic.ci_pipelines p_ci_pipelines_1  (cost=0.58..3.59 rows=1 width=395) (actual time=7.077..7.077 rows=0 loops=1)
               Index Cond: (p_ci_pipelines_1.id = 1234)
               Buffers: shared hit=3 read=4
         ->  Index Scan using ci_pipelines_103_pkey on gitlab_partitions_dynamic.ci_pipelines_103 p_ci_pipelines_2  (cost=0.56..3.58 rows=1 width=396) (actual time=5.418..5.419 rows=0 loops=1)
               Index Cond: (p_ci_pipelines_2.id = 1234)
               Buffers: shared read=4
         ->  Index Scan using ci_pipelines_104_pkey on gitlab_partitions_dynamic.ci_pipelines_104 p_ci_pipelines_3  (cost=0.57..3.58 rows=1 width=392) (actual time=3.849..3.849 rows=0 loops=1)
               Index Cond: (p_ci_pipelines_3.id = 1234)
               Buffers: shared read=4
         ->  Index Scan using ci_pipelines_105_pkey on gitlab_partitions_dynamic.ci_pipelines_105 p_ci_pipelines_4  (cost=0.56..3.58 rows=1 width=381) (actual time=16.666..16.667 rows=0 loops=1)
               Index Cond: (p_ci_pipelines_4.id = 1234)
               Buffers: shared read=4
         ->  Index Scan using ci_pipelines_106_pkey on gitlab_partitions_dynamic.ci_pipelines_106 p_ci_pipelines_5  (cost=0.56..3.58 rows=1 width=395) (actual time=3.103..3.104 rows=0 loops=1)
               Index Cond: (p_ci_pipelines_5.id = 1234)
               Buffers: shared read=4
         ->  Index Scan using ci_pipelines_107_pkey on gitlab_partitions_dynamic.ci_pipelines_107 p_ci_pipelines_6  (cost=0.57..3.59 rows=1 width=393) (actual time=3.243..3.243 rows=0 loops=1)
               Index Cond: (p_ci_pipelines_6.id = 1234)
               Buffers: shared read=4
         ->  Index Scan using ci_pipelines_108_pkey on gitlab_partitions_dynamic.ci_pipelines_108 p_ci_pipelines_7  (cost=0.56..3.58 rows=1 width=394) (actual time=3.132..3.132 rows=0 loops=1)
               Index Cond: (p_ci_pipelines_7.id = 1234)
               Buffers: shared read=4
         ->  Index Scan using ci_pipelines_109_pkey on gitlab_partitions_dynamic.ci_pipelines_109 p_ci_pipelines_8  (cost=0.56..3.58 rows=1 width=385) (actual time=4.046..4.046 rows=0 loops=1)
               Index Cond: (p_ci_pipelines_8.id = 1234)
               Buffers: shared read=4
         ->  Index Scan using ci_pipelines_110_pkey on gitlab_partitions_dynamic.ci_pipelines_110 p_ci_pipelines_9  (cost=0.56..3.58 rows=1 width=378) (actual time=3.533..3.534 rows=0 loops=1)
               Index Cond: (p_ci_pipelines_9.id = 1234)
               Buffers: shared read=4
         ->  Seq Scan on gitlab_partitions_dynamic.ci_pipelines_111 p_ci_pipelines_10  (cost=0.00..0.00 rows=1 width=1816) (actual time=0.022..0.022 rows=0 loops=1)
               Filter: (p_ci_pipelines_10.id = 1234)
               Rows Removed by Filter: 0
Settings: jit = 'off', seq_page_cost = '4', work_mem = '100MB', effective_cache_size = '338688MB', random_page_cost = '1.5'

After (single partition, ~0.02ms):

explain SELECT "p_ci_pipelines".* FROM "p_ci_pipelines" WHERE "p_ci_pipelines"."partition_id" = 100 AND "p_ci_pipelines"."id" = 1234 LIMIT 1;

Limit  (cost=0.58..3.60 rows=1 width=395) (actual time=0.022..0.023 rows=0 loops=1)
   Buffers: shared hit=4
   ->  Index Scan using ci_pipelines_pkey on gitlab_partitions_dynamic.ci_pipelines p_ci_pipelines  (cost=0.58..3.60 rows=1 width=395) (actual time=0.021..0.021 rows=0 loops=1)
         Index Cond: ((p_ci_pipelines.id = 1234) AND (p_ci_pipelines.partition_id = 100))
         Buffers: shared hit=4
Settings: random_page_cost = '1.5', jit = 'off', seq_page_cost = '4', work_mem = '100MB', effective_cache_size = '338688MB'

Relationship to existing partition pruning work

  • PartitionableFinder (Try to find pipeline/jobs in the current partition (!227689 - merged)): The ci_partitionable_finder feature flag (rollout issue: #593873) patches Ci::Pipeline.find_by_id to try the current partition first. This covers the Sidekiq worker path. It does not cover belongs_to association loading, which bypasses find_by_id entirely.
  • Database-native solution in discussion(CI partition pruning: cache record_id → partiti... (#593701)): We have an ongoing discussion on a two-layer partition_id resolution (override table + ID range) will eventually replace PartitionableFinder with deterministic lookups. Currently scoped to p_ci_builds only, with p_ci_pipelines as follow-up. Like PartitionableFinder, it operates on find_by_id calls and not on belongs_to association loading.
  • This MR addresses the belongs_to :head_pipeline association loading path, which is a fundamentally different path that PartitionableFinder cannot reach.

Related to #594572 (closed)

Edited by Madhusudan Vaishnao

Merge request reports

Loading