Fix N+1 queries for downstream pipelines
What does this MR do and why?
Fix N+1 queries for downstream pipelines
Preload pipeline_metadata, retryable_builds, and project associations in triggered_pipelines_with_preloads.
Changelog: fixed
✨ AI generated description ✨
This change improves the performance and functionality of a feature that displays information about related pipelines (automated processes that run code tests and deployments).
The first part optimizes database queries by preloading additional related data upfront, which prevents multiple separate database calls and makes the system faster when displaying pipeline information.
The second part expands the information that can be requested about downstream pipelines (pipelines triggered by other pipelines) through the API. It adds fields like pipeline ID, file path, status details (whether it's running, failed, succeeded, etc.), information about the job that triggered it, and details about the project it belongs to.
Together, these changes make it possible to display more comprehensive pipeline information while ensuring the system performs well even when showing data about many related pipelines at once.
Query Changes
This MR modifies the triggered_pipelines_with_preloads method to add preloading for:
-
pipeline_metadata- for thenamefield -
retryable_builds- for theretryablefield -
project: [:route, { namespace: :route }]- forproject.nameandproject.fullPath
Before (N+1 queries)
Baseline with 1 downstream pipeline: 39 queries
With 3 downstream pipelines: 43 queries (+4)
Example N+1 queries that scale with downstream count:
-- Retryable builds check (1 query per downstream pipeline)
SELECT 1 AS one FROM "p_ci_builds"
WHERE "p_ci_builds"."type" = 'Ci::Build'
AND "p_ci_builds"."commit_id" = 527
AND "p_ci_builds"."partition_id" = 100
AND ("p_ci_builds"."retried" = FALSE OR "p_ci_builds"."retried" IS NULL)
AND ("p_ci_builds"."status" IN ('failed', 'canceled', 'canceling'))
LIMIT 1
-- Pipeline metadata (1 query per downstream pipeline)
SELECT "ci_pipeline_metadata".* FROM "ci_pipeline_metadata"
WHERE "ci_pipeline_metadata"."pipeline_id" = 527
-- Source job lookup (1 query per downstream pipeline)
SELECT "p_ci_builds".* FROM "p_ci_builds"
INNER JOIN "ci_sources_pipelines" ON "p_ci_builds"."id" = "ci_sources_pipelines"."source_job_id"
WHERE "ci_sources_pipelines"."pipeline_id" = 527
LIMIT 1
After (with preloading)
Constant query count regardless of downstream pipeline count
The preloading changes these to batch queries:
-- Single batch query for all pipeline metadata
SELECT "ci_pipeline_metadata".* FROM "ci_pipeline_metadata"
WHERE "ci_pipeline_metadata"."pipeline_id" IN (524, 525, 527)
-- Single batch query for retryable builds
SELECT "p_ci_builds".* FROM "p_ci_builds"
WHERE "p_ci_builds"."commit_id" IN (524, 525, 527)
AND "p_ci_builds"."type" = 'Ci::Build'
AND ("p_ci_builds"."retried" = FALSE OR "p_ci_builds"."retried" IS NULL)
AND ("p_ci_builds"."status" IN ('failed', 'canceled', 'canceling'))
-- Single batch query for projects
SELECT "projects".* FROM "projects"
WHERE "projects"."id" IN (101, 102, 103)
-- Single batch query for routes
SELECT "routes".* FROM "routes"
WHERE "routes"."source_id" IN (101, 102, 103)
AND "routes"."source_type" = 'Project'
-- Single batch query for namespaces
SELECT "namespaces".* FROM "namespaces"
WHERE "namespaces"."id" IN (201, 202, 203)
Query Plans
Pipeline Metadata
Before (N+1 - executed once per downstream pipeline):
Index Scan using ci_pipeline_metadata_pkey on ci_pipeline_metadata (cost=0.15..2.17 rows=1 width=60) (actual time=0.013..0.013 rows=0 loops=1)
Index Cond: (pipeline_id = 527)
Planning Time: 0.267 ms
Execution Time: 0.063 ms
After (batched - executed once for all downstream pipelines):
Index Scan using ci_pipeline_metadata_pkey on ci_pipeline_metadata (cost=0.15..4.50 rows=3 width=60) (actual time=0.008..0.008 rows=0 loops=1)
Index Cond: (pipeline_id = ANY ('{524,525,527}'::bigint[]))
Planning Time: 0.766 ms
Execution Time: 0.015 ms
Performance improvement: 3 queries × 0.063ms = 0.189ms → 1 query × 0.015ms = 12.6× faster
Retryable Builds
After (batched - executed once for all downstream pipelines):
Append (cost=0.00..0.01 rows=3 width=612) (actual time=0.013..0.013 rows=0 loops=1)
-> Seq Scan on ci_builds p_ci_builds_1 (cost=0.00..0.00 rows=1 width=612) (actual time=0.008..0.008 rows=0 loops=1)
Filter: (((NOT retried) OR (retried IS NULL)) AND ((type)::text = 'Ci::Build'::text) AND (commit_id = ANY ('{524,525,527}'::bigint[])) AND ((status)::text = ANY ('{failed,canceled,canceling}'::text[])))
-> Seq Scan on ci_builds_101 p_ci_builds_2 (cost=0.00..0.00 rows=1 width=612) (actual time=0.003..0.003 rows=0 loops=1)
Filter: (((NOT retried) OR (retried IS NULL)) AND ((type)::text = 'Ci::Build'::text) AND (commit_id = ANY ('{524,525,527}'::bigint[])) AND ((status)::text = ANY ('{failed,canceled,canceling}'::text[])))
-> Seq Scan on ci_builds_102 p_ci_builds_3 (cost=0.00..0.00 rows=1 width=612) (actual time=0.001..0.001 rows=0 loops=1)
Filter: (((NOT retried) OR (retried IS NULL)) AND ((type)::text = 'Ci::Build'::text) AND (commit_id = ANY ('{524,525,527}'::bigint[])) AND ((status)::text = ANY ('{failed,canceled,canceling}'::text[])))
Planning Time: 39.000 ms
Execution Time: 0.047 ms
Performance Summary
| Query Type | Before (3 downstreams) | After (batched) | Improvement |
|---|---|---|---|
| Pipeline metadata | 3 queries × 0.063ms = 0.189ms | 1 query × 0.015ms | 12.6× faster |
| Retryable builds | 3 separate queries | 1 query × 0.047ms | Eliminates N+1 |
| Project/Routes | Multiple N+1 queries | Single batched queries | O(N) → O(1) |
| Scaling behavior | Linear O(N) | Constant O(1) | No degradation with more pipelines |
References
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Related to #549069 (closed)