Backfill CiFinishedBuilds CH Table

What does this MR do and why?

  • Adds a batched background migration to backfill CI finished builds data to ClickHouse.
  • Currently, the ci_finished_builds ClickHouse table is populated in real-time when builds finish via Ci::BuildFinishedWorker.
  • However, as we added new columns, historical data for such columns is missing.
  • This MR backfills the last 180 days of finished builds by creating sync events in p_ci_finished_build_ch_sync_events, which will then be processed by the existing ClickHouse::DataIngestion::CiFinishedBuildsSyncService.
  • Yes, it creates a duplicate entry in the Clickhouse table, but it is deduplicated in Clickhouse using versioning. Read more: https://clickhouse.com/docs/guides/replacing-merge-tree

Implementation Details:

  1. Queue Migration (db/post_migrate/20260204083514_queue_backfill_ci_finished_builds_to_click_house.rb):
    • Finds the (approximate) starting id by querying the first successful build created within the last 180 days (order by created_at)
    • Uses index p_ci_builds_status_created_at_project_id_idx for efficient lookup
    • Sets backfill_in_progress! flag to enable deduplicated queries during backfill (refer !221683 (merged))
    • Batches by id (primary key) for reliable iteration
  2. EE Migration (ee/lib/ee/gitlab/background_migration/backfill_ci_finished_builds_to_click_house.rb):
    • Filters builds by type = 'Ci::Build', status IN ('success', 'failed', 'canceled'), and finished_at IS NOT NULL
    • Uses upsert_all with unique_by: [:build_id, :partition] to handle duplicates gracefully

References

#585179 (closed)

How to set up and validate locally

Prerequisites

  1. Ensure ClickHouse is running in your GDK
  2. Ensure the p_ci_finished_build_ch_sync_events table exists

Step 1: Create test builds

# In rails console

namespace = FactoryBot.create(:namespace, path: "test-namespace-#{Time.now.to_i}")
project = FactoryBot.create(:project, name: "test-project-#{Time.now.to_i}", namespace: namespace)
pipeline = FactoryBot.create(:ci_pipeline, project: project)

# Create builds with different statuses and ages
old_build = FactoryBot.create(:ci_build, :success, pipeline: pipeline, created_at: 200.days.ago, finished_at: 200.days.ago)
recent_success = FactoryBot.create(:ci_build, :success, pipeline: pipeline, created_at: 100.days.ago, finished_at: 100.days.ago)
recent_failed = FactoryBot.create(:ci_build, :failed, pipeline: pipeline, created_at: 90.days.ago, finished_at: 90.days.ago)
recent_canceled = FactoryBot.create(:ci_build, :canceled, pipeline: pipeline, created_at: 80.days.ago, finished_at: 80.days.ago)
running_build = FactoryBot.create(:ci_build, :running, pipeline: pipeline, created_at: 10.days.ago)
bridge = FactoryBot.create(:ci_bridge, :success, pipeline: pipeline, created_at: 50.days.ago, finished_at: 50.days.ago)


puts "Created builds:"
puts "  old_build (200 days ago, outside backfill period): #{old_build.id}"
puts "  recent_success (100 days ago): #{recent_success.id}"
puts "  recent_failed (90 days ago): #{recent_failed.id}"
puts "  recent_canceled (80 days ago): #{recent_canceled.id}"
puts "  running_build (10 days ago, not finished): #{running_build.id}"
puts "  bridge (50 days ago, not Ci::Build): #{bridge.id}"

Step 2: Run the migration

# In terminal, run the migration

bundle exec rake db:migrate
You should see output like:
ci: == [advisory_lock_connection] object_id: 134440, pg_backend_pid: 22794
ci: == 20260204083514 QueueBackfillCiFinishedBuildsToClickHouse: migrating ========
ci: -- execute("SELECT id\nFROM p_ci_builds\nWHERE type = 'Ci::Build'\n  AND status = 'success'\n  AND created_at >= '2025-08-10T00:00:00Z'\nORDER BY created_at ASC\nLIMIT 1\n")
ci:    -> 0.0130s
ci: -- Starting backfill from id: 100814
ci: == 20260204083514 QueueBackfillCiFinishedBuildsToClickHouse: migrated (0.1343s)

ci: == [advisory_lock_connection] object_id: 134440, pg_backend_pid: 22794

Step 3: Verify migration was scheduled

# In rails console
connection =  Ci::ApplicationRecord.connection

Gitlab::Database::SharedModel.using_connection(connection) do
  migration = Gitlab::Database::BackgroundMigration::BatchedMigration.find_by(
    job_class_name: 'BackfillCiFinishedBuildsToClickHouse'
  )
  puts "Migration scheduled: #{migration.present?}"
  puts "Status: #{migration&.status}"
  puts "Min value: #{migration&.min_value}"
  puts "Max value: #{migration&.max_value}"
end

Step 4: Execute a batch manually (Ideally this is not needed in local as we'd have less number of records)

# In rails console

# verify if the migration is completed already:

migration.status == 3 ? '✓ finished' : '✗ NOT finished'

# Get sync events count before
before_count = Ci::FinishedBuildChSyncEvent.count
puts "Sync events before: #{before_count}"

# Execute one batch
Gitlab::Database::BackgroundMigration::BatchedMigrationRunner.new(connection: Ci::ApplicationRecord.connection).run_migration_job(migration)

# Get sync events count after
after_count = Ci::FinishedBuildChSyncEvent.count
puts "Sync events after: #{after_count}"
puts "New sync events created: #{after_count - before_count}"

Step 5: Verify sync events were created correctly

# In rails console

# Check sync events for builds that SHOULD be synced
[recent_success, recent_failed, recent_canceled].each do |build|
  event = Ci::FinishedBuildChSyncEvent.find_by(build_id: build.id)
  status = event.present? ? '✓ created' : '✗ NOT created'
  puts "Build #{build.id} (#{build.status}): #{status}"
end

# Verify builds that should NOT have sync events
puts "\nBuilds that should be skipped:"
[
  [old_build, 'outside 180-day period'],
  [running_build, 'not finished'],
  [bridge, 'Ci::Bridge not Ci::Build']
].each do |build, reason|
  event = Ci::FinishedBuildChSyncEvent.find_by(build_id: build.id)
  status = event.present? ? '✗ INCORRECTLY created' : '✓ correctly skipped'
  puts "Build #{build.id} (#{reason}): #{status}"
end; nil

Step 6: Verify backfill_in_progress flag

# In rails console
flag_set = ClickHouse::MigrationSupport::CiFinishedBuildsConsistencyHelper.backfill_in_progress?
puts "Backfill in progress flag: #{flag_set}"

Step 7: Cleanup

# In rails console - Clean up test data
pipeline.builds.destroy_all
pipeline.destroy!
project.destroy!
namespace.destroy!

# Remove the batched migration (for re-testing) or run rollback
Gitlab::Database::BackgroundMigration::BatchedMigration.find_by(
job_class_name: 'BackfillCiFinishedBuildsToClickHouse'
)&.destroy!

# Clear the backfill flag
ClickHouse::MigrationSupport::CiFinishedBuildsConsistencyHelper.clear_backfill_in_progress!

Rollback

# In terminal
VERSION=20260204083514 bundle exec rails db:migrate:down:ci

Query Plans:

1. FIND MIN_ID QUERY

Query:

SELECT id
FROM p_ci_builds
WHERE type = 'Ci::Build'
  AND status = 'success'
  AND created_at >= NOW() - INTERVAL '180 days'
ORDER BY created_at ASC
LIMIT 1;

Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/48132/commands/144970

2. BATCHING QUERY (find batch end - BATCH_SIZE=1000)

Query:

SELECT id
FROM p_ci_builds
WHERE id >= 10971620999
ORDER BY id ASC
LIMIT 1 OFFSET 1000;

Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/48132/commands/144971

3. SUB-BATCH BOUNDARY QUERY (SUB_BATCH_SIZE=100)

Query:

SELECT id
FROM p_ci_builds
WHERE id >= 10971620999
  AND id < 10971622072
ORDER BY id ASC
LIMIT 1 OFFSET 100;

Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/48132/commands/144974

4. FILTER AND FETCH SUB-BATCH DATA

Query:

SELECT id, project_id, finished_at
FROM p_ci_builds
WHERE id >= 10971620999
  AND id < 10971621126
  AND type = 'Ci::Build'
  AND status IN ('success', 'failed', 'canceled')
  AND finished_at IS NOT NULL;

Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/48132/commands/144975

5. UPSERT QUERY

Query:

INSERT INTO p_ci_finished_build_ch_sync_events (build_id, project_id, build_finished_at, processed)
VALUES
  (1000001, 100, '2025-01-15 10:00:00', false),
  (1000002, 100, '2025-01-15 10:00:01', false),
  (1000003, 100, '2025-01-15 10:00:02', false),
  (1000004, 100, '2025-01-15 10:00:03', false),
  (1000005, 100, '2025-01-15 10:00:04', false)
ON CONFLICT (build_id, partition)
DO UPDATE SET
  project_id = EXCLUDED.project_id,
  build_finished_at = EXCLUDED.build_finished_at,
  processed = EXCLUDED.processed
RETURNING build_id, partition;

Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/48132/commands/144976

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Narendran

Merge request reports

Loading