Introduce Pipelines::AutoCleanupService class

What does this MR do and why?

Introduces Pipelines::AutoCleanupService to fix statement timeouts in Ci::DestroyOldPipelinesWorker by optimizing pipeline deletion queries.

The current implementation causes PG::QueryCanceled errors due to inefficient queries when deleting old pipelines. This service addresses the performance issue by:

  • Splitting queries by pipeline status to leverage database indexes efficiently (completed statuses queried individually, non-completed statuses grouped together)
  • Using keyset pagination with created_at and id ordering to avoid expensive merge operations
  • Implementing per-status caching to track deletion progress and resume from the last processed timestamp
  • Filtering protected and locked pipelines based on feature flags

Note: This service is not yet integrated with Ci::DestroyOldPipelinesWorker. Will be added in the next MR.

This approach ensures queries complete within statement timeout limits while maintaining deletion progress across worker executions.

References

#572278 (closed)

Screenshots or screen recordings

N/A

How to set up and validate locally

  1. Create a project with old pipelines across different statuses
  2. Configure ci_delete_pipelines_in_seconds retention policy
  3. Execute Ci::DestroyOldPipelinesService.new(project: project).execute
  4. Verify pipelines older than the retention period are deleted
  5. Check Redis cache contains per-status timestamps for resumption

1. Setup - Create Project and Pipelines

# Create a project with 90-day retention period
timestamp = Time.now.to_i
unique_path = "test-pipeline-delete-#{timestamp}"

namespace = FactoryBot.create(:namespace, path: unique_path)


project = FactoryBot.create(
  :project,
  name: unique_path,
  path: unique_path,
  namespace: namespace,
  ci_delete_pipelines_in_seconds: 90.days.to_i
)


# Create ancient pipelines (1 year old) - should be deleted
ancient_success = FactoryBot.create_list(:ci_pipeline, 10, 
  project: project, 
  status: 'success', 
  created_at: 1.year.ago,
  locked: :unlocked
)

ancient_failed = FactoryBot.create_list(:ci_pipeline, 10, 
  project: project, 
  status: 'failed', 
  created_at: 1.year.ago,
  locked: :unlocked
)

ancient_running = FactoryBot.create_list(:ci_pipeline, 5, 
  project: project, 
  status: 'running', 
  created_at: 1.year.ago,
  locked: :unlocked
)

# Create old pipelines (6 months old) - should be deleted
old_success = FactoryBot.create_list(:ci_pipeline, 10, 
  project: project, 
  status: 'success', 
  created_at: 6.months.ago,
  locked: :unlocked
)

old_failed = FactoryBot.create_list(:ci_pipeline, 10, 
  project: project, 
  status: 'failed', 
  created_at: 6.months.ago,
  locked: :unlocked
)

# Create recent pipelines (1 week old) - should NOT be deleted
recent_success = FactoryBot.create_list(:ci_pipeline, 10, 
  project: project, 
  status: 'success', 
  created_at: 1.week.ago,
  locked: :unlocked
)

recent_failed = FactoryBot.create_list(:ci_pipeline, 10, 
  project: project, 
  status: 'failed', 
  created_at: 1.week.ago,
  locked: :unlocked
)

2. Verify Pipeline Counts Before

# Check total counts
project.all_pipelines.count
# => Should be 65

# Check by status
project.all_pipelines.where(status: 'success').count
# => Should be 30

project.all_pipelines.where(status: 'failed').count
# => Should be 30

# Check by age
cutoff = 90.days.ago
project.all_pipelines.where('created_at < ?', cutoff).count
# => Should be 45 (ancient + old)

project.all_pipelines.where('created_at >= ?', cutoff).count
# => Should be 20 (recent)

3. Run the Service

service = Ci::Pipelines::AutoCleanupService.new(project: project)
result = service.execute

# Check the result
result.success?
# => true

result.payload[:destroyed_pipelines_size]
# => Should be 45

4. Verify Pipeline Counts After

# Total should decrease by 45
project.reload.all_pipelines.count
# => Should be 20

# Only recent pipelines should remain
project.all_pipelines.where(status: 'success').count
# => Should be 10

project.all_pipelines.where(status: 'failed').count
# => Should be 10

# Verify all remaining pipelines are recent
project.all_pipelines.where('created_at < ?', 90.days.ago).count
# => Should be 0

5. Check Cache

# Read the cache
cache = Ci::RetentionPolicies::PipelineDeletionCutoffCache.new(project: project).read

# Inspect cache values
cache.keys
# => ["success", "failed", "canceled", "skipped", "manual", "other"]

cache.values.map(&:to_date)
# Should show dates around 6 months ago (the most recent deleted pipelines)
# => [Tue, 13 May 2025, Tue, 13 May 2025, Fri, 15 Aug 2025, Fri, 15 Aug 2025, Fri, 15 Aug 2025, Wed, 13 Nov 2024]

6. Test with Protected/Locked Pipelines

# Create protected pipelines that should be skipped
protected_ancient = FactoryBot.create_list(:ci_pipeline, 5,
  :protected,
  project: project,
  status: 'success',
  created_at: 1.year.ago,
  locked: :unlocked
)

# Create locked pipelines that should be skipped
locked_ancient = FactoryBot.create_list(:ci_pipeline, 5,
  :artifacts_locked,
  project: project,
  status: 'success',
  created_at: 1.year.ago
)

# Run service again
before_count = project.reload.all_pipelines.count
service = Ci::Pipelines::AutoCleanupService.new(project: project)
service.execute
after_count = project.reload.all_pipelines.count

# Verify protected/locked pipelines were NOT deleted
before_count - after_count
# => Should be 0 (or minimal if there were other unprotected/unlocked old pipelines)

# Verify protected/locked pipelines still exist
Ci::Pipeline.where(id: protected_ancient.map(&:id)).count
# => Should be 5

Ci::Pipeline.where(id: locked_ancient.map(&:id)).count
# => Should be 5

7. Clean Up

# Clean up test data when done
project.destroy!
namespace.destroy!

Complete Testing Snippet:

Here's a simple snippet to copy-paste in rails console:

# === Setup ===
timestamp = Time.now.to_i
unique_path = "test-pipeline-delete-#{timestamp}"

namespace = FactoryBot.create(:namespace, path: unique_path)


project = FactoryBot.create(
  :project,
  name: unique_path,
  path: unique_path,
  namespace: namespace,
  ci_delete_pipelines_in_seconds: 90.days.to_i
)

# Create pipelines at different ages
ancient = FactoryBot.create_list(:ci_pipeline, 20, project: project, status: 'success', created_at: 1.year.ago, locked: :unlocked)
old = FactoryBot.create_list(:ci_pipeline, 15, project: project, status: 'failed', created_at: 6.months.ago, locked: :unlocked)
recent = FactoryBot.create_list(:ci_pipeline, 10, project: project, status: 'success', created_at: 1.week.ago, locked: :unlocked)

puts "Before: #{project.all_pipelines.count} pipelines"
# => 45

# === Execute Service ===
service = Ci::Pipelines::AutoCleanupService.new(project: project)
result = service.execute

puts "Deleted: #{result.payload[:destroyed_pipelines_size]} pipelines"
# => 35

puts "After: #{project.reload.all_pipelines.count} pipelines"
# => 10 (only recent remain)

# === Verify Cache ===
cache = Ci::RetentionPolicies::PipelineDeletionCutoffCache.new(project: project).read
puts "Cache timestamps: #{cache.values.map { |t| t.strftime('%Y-%m-%d') }.uniq}"

# === Cleanup ===
project.destroy!
namespace.destroy!

Database Query Plans

Query Plans

With Cutoff Timestamp

Status: success

SELECT "p_ci_pipelines".*
FROM "p_ci_pipelines"
WHERE "p_ci_pipelines"."project_id" = 278964
  AND "p_ci_pipelines"."status" = 'success'
  AND "p_ci_pipelines"."created_at" < '2025-10-15 00:00:00'
  AND "p_ci_pipelines"."created_at" >= '2025-09-14 00:00:00'
ORDER BY "p_ci_pipelines"."created_at" ASC, "p_ci_pipelines"."id" ASC
LIMIT 50;

Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/45482/commands/139371

Status: failed

SELECT "p_ci_pipelines".* 
FROM "p_ci_pipelines" 
WHERE "p_ci_pipelines"."project_id" = 278964 
  AND "p_ci_pipelines"."status" = 'failed'
  AND "p_ci_pipelines"."created_at" < '2025-10-15 00:00:00'
  AND "p_ci_pipelines"."created_at" >= '2025-09-14 00:00:00'
ORDER BY "p_ci_pipelines"."created_at" ASC, "p_ci_pipelines"."id" ASC 
LIMIT 50;

Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/45482/commands/139373

Status: non-completed

SELECT "p_ci_pipelines".* 
FROM "p_ci_pipelines" 
WHERE "p_ci_pipelines"."project_id" = 278964 
  AND "p_ci_pipelines"."status" IN ('created', 'waiting_for_resource', 'preparing', 'waiting_for_callback', 'pending', 'running', 'canceling', 'scheduled')
  AND "p_ci_pipelines"."created_at" < '2025-10-15 00:00:00'
  AND "p_ci_pipelines"."created_at" >= '2025-08-14 00:00:00'
ORDER BY "p_ci_pipelines"."created_at" ASC, "p_ci_pipelines"."id" ASC 
LIMIT 50;

Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/45482/commands/139374

Without Cutoff Timestamp

Status: success

SELECT "p_ci_pipelines".* 
FROM "p_ci_pipelines" 
WHERE "p_ci_pipelines"."project_id" = 278964 
  AND "p_ci_pipelines"."status" = 'success'
  AND "p_ci_pipelines"."created_at" < '2025-08-15 00:00:00'
ORDER BY "p_ci_pipelines"."created_at" ASC, "p_ci_pipelines"."id" ASC 
LIMIT 50;

Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/45482/commands/139378

Status: failed

SELECT "p_ci_pipelines".* 
FROM "p_ci_pipelines" 
WHERE "p_ci_pipelines"."project_id" = 278964 
  AND "p_ci_pipelines"."status" = 'failed'
  AND "p_ci_pipelines"."created_at" < '2025-08-15 00:00:00'
ORDER BY "p_ci_pipelines"."created_at" ASC, "p_ci_pipelines"."id" ASC 
LIMIT 50;

Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/45482/commands/139379

Status: non-completed

SELECT "p_ci_pipelines".* 
FROM "p_ci_pipelines" 
WHERE "p_ci_pipelines"."project_id" = 278964 
  AND "p_ci_pipelines"."status" IN ('created', 'waiting_for_resource', 'preparing', 'waiting_for_callback', 'pending', 'running', 'canceling', 'scheduled')
  AND "p_ci_pipelines"."created_at" < '2025-08-15 00:00:00'
ORDER BY "p_ci_pipelines"."created_at" ASC, "p_ci_pipelines"."id" ASC 
LIMIT 50;

Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/45482/commands/139380

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Narendran

Merge request reports

Loading