Fix statement timeouts on p_ci_job_artifacts during pipeline deletion
What does this MR do and why?
Fixes statement timeouts when deleting pipelines with many builds/job artifacts.
The Ci::DestroyPipelineService path calls pipeline.destroy!, which triggers use_fast_destroy :job_artifacts. This calls each_batch on pipeline.job_artifacts — a has_many :through :builds association — generating a JOIN between p_ci_job_artifacts and p_ci_builds. The LIMIT 1 ORDER BY p_ci_job_artifacts.id from each_batch tricks the PostgreSQL planner into scanning via p_ci_job_artifacts_pkey instead of using p_ci_builds_commit_id_status_type_idx, causing full table scans and statement timeouts.
A Dedicated customer experiences 100% failure rate on pipeline deletion. Pipelines can have 44K+ builds (based on data queried in ClickHouse production).
Why FastDestroyAll / use_fast_destroy doesn't work here
The use_fast_destroy :job_artifacts pattern calls perform_fast_destroy(pipeline.job_artifacts) which delegates to Ci::JobArtifact.begin_fast_destroy. Inside begin_fast_destroy, the DestroyAssociationsService calls each_batch on the has_many :through relation. This is where the problematic JOIN query is generated — the planner picks the wrong index because each_batch adds ORDER BY id ASC LIMIT 1, which biases toward scanning the primary key index.
The fundamental issue is that FastDestroyAll operates on a single flat relation and has no concept of batching through an intermediate association. It receives pipeline.job_artifacts (the through-association) and calls each_batch directly on it, which always generates the JOIN.
Why two-level batching is needed
A single flat CTE-based relation (wrapping all build IDs in one CTE) doesn't scale either. For a pipeline with 44K builds, the CTE materializes all build IDs and does a nested loop join (44K loops), taking ~18.5s — still timing out.
The solution requires two-level batching:
-
Level 1 (builds): Batch pipeline builds in groups of 100 using
Pipeline#builds_with_cte— a CTE-wrapped relation that forces the planner to usep_ci_builds_commit_id_status_type_idx -
Level 2 (artifacts): For each build batch, query
Ci::JobArtifact.where(job_id: [batch_ids], partition_id: ...)and batch-destroy artifacts using the existingDestroyAssociationsService
This keeps each query operating on a small, bounded set.
Approach
-
Introduce
Ci::Pipelines::DestroyAssociationsService— a new service that orchestrates two-level batched destruction. It iteratesPipeline#builds_with_ctein batches of 100, and for each build batch delegates artifact destruction toCi::JobArtifacts::DestroyAssociationsService. A singleDestroyAssociationsServiceinstance accumulates statistics across all build batches, sofinalize_fast_destroycan be called once with a single service object. -
Replace
use_fast_destroy :job_artifactswith abefore_destroycallback —Pipeline#destroy_job_artifact_associationscalls the new service and defers statistics finalization viarun_after_commit, preserving the same transactional safety asFastDestroyAll. -
Refactor
Ci::JobArtifacts::DestroyAssociationsService— move the relation from the constructor todestroy_records(relation)as a parameter, enabling the same service instance to be reused across multiple build batches while accumulating statistics. -
Add
Pipeline#builds_with_cte— wraps the builds query in a materialized CTE that forces the planner to usep_ci_builds_commit_id_status_type_idx. This is also reused byCi::UnlockPipelineService, removing its duplicated CTE construction.
What stays unchanged
-
Ci::JobArtifact.begin_fast_destroy/finalize_fast_destroy— the FastDestroyAll contract on the model is preserved -
Projects::DestroyService#destroy_orphaned_ci_job_artifacts!— usesbegin_fast_destroyon a plainCi::JobArtifact.for_project(project)scope (no through-association, no planner issue), so it continues to work unchanged -
FastDestroyAllconcern — not modified
References
- Closes #582836
- Related:
Ci::UnlockPipelineServiceuses the same CTE pattern (introduced in !147878 (merged))
Queries
OLD query (problematic — from pipeline.job_artifacts.each_batch)
SELECT "p_ci_job_artifacts"."id" FROM "p_ci_job_artifacts"
INNER JOIN "p_ci_builds" ON "p_ci_job_artifacts"."job_id" = "p_ci_builds"."id"
WHERE "p_ci_builds"."type" = 'Ci::Build'
AND "p_ci_builds"."commit_id" = :pipeline_id
AND "p_ci_builds"."partition_id" = :partition_id
AND "p_ci_job_artifacts"."partition_id" = :partition_id
ORDER BY "p_ci_job_artifacts"."id" ASC
LIMIT 1 OFFSET 100;
The planner picks p_ci_job_artifacts_pkey for the ORDER BY id LIMIT 1 and does a full scan through artifacts trying to find 100 that match the JOIN condition.
NEW queries (two-level batching)
Level 1: Batch builds via CTE — find start
WITH "cte_builds" AS MATERIALIZED (
SELECT "p_ci_builds"."id" FROM "p_ci_builds"
WHERE "p_ci_builds"."type" = 'Ci::Build'
AND "p_ci_builds"."commit_id" = :pipeline_id
AND "p_ci_builds"."partition_id" = :partition_id
AND "p_ci_builds"."status" IN (
'created', 'waiting_for_resource', 'preparing', 'waiting_for_callback',
'pending', 'running', 'failed', 'success', 'canceled', 'canceling',
'skipped', 'manual', 'scheduled'
)
)
SELECT "p_ci_builds"."id" FROM "cte_builds" AS "p_ci_builds"
ORDER BY "p_ci_builds"."id" ASC
LIMIT 1;
Level 1: Batch builds via CTE — find boundary
WITH "cte_builds" AS MATERIALIZED (
SELECT "p_ci_builds"."id" FROM "p_ci_builds"
WHERE "p_ci_builds"."type" = 'Ci::Build'
AND "p_ci_builds"."commit_id" = :pipeline_id
AND "p_ci_builds"."partition_id" = :partition_id
AND "p_ci_builds"."status" IN (
'created', 'waiting_for_resource', 'preparing', 'waiting_for_callback',
'pending', 'running', 'failed', 'success', 'canceled', 'canceling',
'skipped', 'manual', 'scheduled'
)
)
SELECT "p_ci_builds"."id" FROM "cte_builds" AS "p_ci_builds"
WHERE "p_ci_builds"."id" >= :start_id
ORDER BY "p_ci_builds"."id" ASC
LIMIT 1 OFFSET 100;
Level 1: Fetch build batch IDs (pluck)
WITH "cte_builds" AS MATERIALIZED (
SELECT "p_ci_builds"."id" FROM "p_ci_builds"
WHERE "p_ci_builds"."type" = 'Ci::Build'
AND "p_ci_builds"."commit_id" = :pipeline_id
AND "p_ci_builds"."partition_id" = :partition_id
AND "p_ci_builds"."status" IN (
'created', 'waiting_for_resource', 'preparing', 'waiting_for_callback',
'pending', 'running', 'failed', 'success', 'canceled', 'canceling',
'skipped', 'manual', 'scheduled'
)
)
SELECT "p_ci_builds"."id" FROM "cte_builds" AS "p_ci_builds"
WHERE "p_ci_builds"."id" >= :start_id
AND "p_ci_builds"."id" < :stop_id;
Level 2: Batch artifacts per build batch — find start
SELECT "p_ci_job_artifacts"."id" FROM "p_ci_job_artifacts"
WHERE "p_ci_job_artifacts"."job_id" IN (:build_id_1, :build_id_2, ..., :build_id_100)
AND "p_ci_job_artifacts"."partition_id" = :partition_id
ORDER BY "p_ci_job_artifacts"."id" ASC
LIMIT 1;
Level 2: Batch artifacts per build batch — find boundary
SELECT "p_ci_job_artifacts"."id" FROM "p_ci_job_artifacts"
WHERE "p_ci_job_artifacts"."job_id" IN (:build_id_1, :build_id_2, ..., :build_id_100)
AND "p_ci_job_artifacts"."partition_id" = :partition_id
AND "p_ci_job_artifacts"."id" >= :start_id
ORDER BY "p_ci_job_artifacts"."id" ASC
LIMIT 1 OFFSET 100;
Level 2: Fetch artifact batch records
SELECT "p_ci_job_artifacts".* FROM "p_ci_job_artifacts"
WHERE "p_ci_job_artifacts"."job_id" IN (:build_id_1, :build_id_2, ..., :build_id_100)
AND "p_ci_job_artifacts"."partition_id" = :partition_id
AND "p_ci_job_artifacts"."id" >= :start_id
AND "p_ci_job_artifacts"."id" < :stop_id;
Query Plans
OLD Approach (JOIN via has_many :job_artifacts, through: :builds)
Find first batch start ID
SELECT "p_ci_job_artifacts"."id" FROM "p_ci_job_artifacts"
INNER JOIN "p_ci_builds" ON "p_ci_job_artifacts"."job_id" = "p_ci_builds"."id"
WHERE "p_ci_builds"."type" = 'Ci::Build'
AND "p_ci_builds"."commit_id" = :pipeline_id
AND "p_ci_builds"."partition_id" = :partition_id
AND "p_ci_job_artifacts"."partition_id" = :partition_id
ORDER BY "p_ci_job_artifacts"."id" ASC
LIMIT 1;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/49328/commands/147202
Find batch boundary (stop ID)
SELECT "p_ci_job_artifacts"."id" FROM "p_ci_job_artifacts"
INNER JOIN "p_ci_builds" ON "p_ci_job_artifacts"."job_id" = "p_ci_builds"."id"
WHERE "p_ci_builds"."type" = 'Ci::Build'
AND "p_ci_builds"."commit_id" = :pipeline_id
AND "p_ci_builds"."partition_id" = :partition_id
AND "p_ci_job_artifacts"."partition_id" = :partition_id
AND "p_ci_job_artifacts"."id" >= :start_id
ORDER BY "p_ci_job_artifacts"."id" ASC
LIMIT 1 OFFSET 100;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/49328/commands/147203
Fetch batch records
SELECT "p_ci_job_artifacts".* FROM "p_ci_job_artifacts"
INNER JOIN "p_ci_builds" ON "p_ci_job_artifacts"."job_id" = "p_ci_builds"."id"
WHERE "p_ci_builds"."type" = 'Ci::Build'
AND "p_ci_builds"."commit_id" = :pipeline_id
AND "p_ci_builds"."partition_id" = :partition_id
AND "p_ci_job_artifacts"."partition_id" = :partition_id
AND "p_ci_job_artifacts"."id" >= :start_id
AND "p_ci_job_artifacts"."id" < :stop_id;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/49328/commands/147205
NEW Approach (two-level batching: CTE builds → artifacts)
Level 1: Batch builds — find start
WITH "cte_builds" AS MATERIALIZED (
SELECT "p_ci_builds"."id" FROM "p_ci_builds"
WHERE "p_ci_builds"."type" = 'Ci::Build'
AND "p_ci_builds"."commit_id" = :pipeline_id
AND "p_ci_builds"."partition_id" = :partition_id
AND "p_ci_builds"."status" IN (
'created', 'waiting_for_resource', 'preparing', 'waiting_for_callback',
'pending', 'running', 'failed', 'success', 'canceled', 'canceling',
'skipped', 'manual', 'scheduled'
)
)
SELECT "p_ci_builds"."id" FROM "cte_builds" AS "p_ci_builds"
ORDER BY "p_ci_builds"."id" ASC
LIMIT 1;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/49578/commands/147792
Level 1: Batch builds — find boundary
WITH "cte_builds" AS MATERIALIZED (
SELECT "p_ci_builds"."id" FROM "p_ci_builds"
WHERE "p_ci_builds"."type" = 'Ci::Build'
AND "p_ci_builds"."commit_id" = :pipeline_id
AND "p_ci_builds"."partition_id" = :partition_id
AND "p_ci_builds"."status" IN (
'created', 'waiting_for_resource', 'preparing', 'waiting_for_callback',
'pending', 'running', 'failed', 'success', 'canceled', 'canceling',
'skipped', 'manual', 'scheduled'
)
)
SELECT "p_ci_builds"."id" FROM "cte_builds" AS "p_ci_builds"
WHERE "p_ci_builds"."id" >= :start_id
ORDER BY "p_ci_builds"."id" ASC
LIMIT 1 OFFSET 100;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/49578/commands/147826
Level 1: Batch builds — query batch
WITH "cte_builds" AS MATERIALIZED (
SELECT "p_ci_builds"."id" FROM "p_ci_builds"
WHERE "p_ci_builds"."type" = 'Ci::Build'
AND "p_ci_builds"."commit_id" = :pipeline_id
AND "p_ci_builds"."partition_id" = :partition_id
AND "p_ci_builds"."status" IN (
'created', 'waiting_for_resource', 'preparing', 'waiting_for_callback',
'pending', 'running', 'failed', 'success', 'canceled', 'canceling',
'skipped', 'manual', 'scheduled'
)
)
SELECT "p_ci_builds"."id" FROM "cte_builds" AS "p_ci_builds"
WHERE "p_ci_builds"."id" >= :start_id
ORDER BY "p_ci_builds"."id" ASC
LIMIT 100 OFFSET 100;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/49578/commands/147827
Level 2: Batch artifacts — find start
SELECT "p_ci_job_artifacts"."id" FROM "p_ci_job_artifacts"
WHERE "p_ci_job_artifacts"."job_id" IN (:build_id_1, :build_id_2, ..., :build_id_100)
AND "p_ci_job_artifacts"."partition_id" = :partition_id
ORDER BY "p_ci_job_artifacts"."id" ASC
LIMIT 1;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/49578/commands/147828
Level 2: Batch artifacts — find boundary
SELECT "p_ci_job_artifacts"."id" FROM "p_ci_job_artifacts"
WHERE "p_ci_job_artifacts"."job_id" IN (:build_id_1, :build_id_2, ..., :build_id_100)
AND "p_ci_job_artifacts"."partition_id" = :partition_id
AND "p_ci_job_artifacts"."id" >= :start_id
ORDER BY "p_ci_job_artifacts"."id" ASC
LIMIT 1 OFFSET 100;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/49578/commands/147830
Level 2: Fetch artifact batch records
SELECT "p_ci_job_artifacts".* FROM "p_ci_job_artifacts"
WHERE "p_ci_job_artifacts"."job_id" IN (:build_id_1, :build_id_2, ..., :build_id_100)
AND "p_ci_job_artifacts"."partition_id" = :partition_id
AND "p_ci_job_artifacts"."id" >= :start_id
AND "p_ci_job_artifacts"."id" < :stop_id;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/49578/commands/147832
How to set up and validate locally
Run the specs
# New pipeline destroy associations service
bundle exec rspec spec/services/ci/pipelines/destroy_associations_service_spec.rb
# DestroyAssociationsService (refactored API)
bundle exec rspec spec/services/ci/job_artifacts/destroy_associations_service_spec.rb
# Pipeline model specs (builds_with_cte + destroy_job_artifact_associations)
bundle exec rspec spec/models/ci/pipeline_spec.rb -e 'builds_with_cte' -e 'destroy_job_artifact_associations'
# Full pipeline destruction integration test
bundle exec rspec spec/services/ci/destroy_pipeline_service_spec.rb
# UnlockPipelineService (uses shared builds_with_cte)
bundle exec rspec spec/services/ci/unlock_pipeline_service_spec.rb
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.