Backfill CiFinishedBuilds CH Table
What does this MR do and why?
- Adds a batched background migration to backfill CI finished builds data to ClickHouse.
- Currently, the ci_finished_builds ClickHouse table is populated in real-time when builds finish via Ci::BuildFinishedWorker.
- However, as we added new columns, historical data for such columns is missing.
- This MR backfills the last 180 days of finished builds by creating sync events in
p_ci_finished_build_ch_sync_events, which will then be processed by the existingClickHouse::DataIngestion::CiFinishedBuildsSyncService. - Yes, it creates a duplicate entry in the Clickhouse table, but it is deduplicated in Clickhouse using versioning. Read more: https://clickhouse.com/docs/guides/replacing-merge-tree
Implementation Details:
- Queue Migration (db/post_migrate/20260204083514_queue_backfill_ci_finished_builds_to_click_house.rb):
- Finds the (approximate) starting id by querying the first successful build created within the last 180 days (order by created_at)
- Uses index
p_ci_builds_status_created_at_project_id_idxfor efficient lookup - Sets
backfill_in_progress!flag to enable deduplicated queries during backfill (refer !221683 (merged)) - Batches by id (primary key) for reliable iteration
- EE Migration (ee/lib/ee/gitlab/background_migration/backfill_ci_finished_builds_to_click_house.rb):
- Filters builds by
type = 'Ci::Build',status IN ('success', 'failed', 'canceled'), andfinished_at IS NOT NULL - Uses upsert_all with unique_by: [:build_id, :partition] to handle duplicates gracefully
- Filters builds by
References
How to set up and validate locally
Prerequisites
- Ensure ClickHouse is running in your GDK
- Ensure the
p_ci_finished_build_ch_sync_eventstable exists
Step 1: Create test builds
# In rails console
namespace = FactoryBot.create(:namespace, path: "test-namespace-#{Time.now.to_i}")
project = FactoryBot.create(:project, name: "test-project-#{Time.now.to_i}", namespace: namespace)
pipeline = FactoryBot.create(:ci_pipeline, project: project)
# Create builds with different statuses and ages
old_build = FactoryBot.create(:ci_build, :success, pipeline: pipeline, created_at: 200.days.ago, finished_at: 200.days.ago)
recent_success = FactoryBot.create(:ci_build, :success, pipeline: pipeline, created_at: 100.days.ago, finished_at: 100.days.ago)
recent_failed = FactoryBot.create(:ci_build, :failed, pipeline: pipeline, created_at: 90.days.ago, finished_at: 90.days.ago)
recent_canceled = FactoryBot.create(:ci_build, :canceled, pipeline: pipeline, created_at: 80.days.ago, finished_at: 80.days.ago)
running_build = FactoryBot.create(:ci_build, :running, pipeline: pipeline, created_at: 10.days.ago)
bridge = FactoryBot.create(:ci_bridge, :success, pipeline: pipeline, created_at: 50.days.ago, finished_at: 50.days.ago)
puts "Created builds:"
puts " old_build (200 days ago, outside backfill period): #{old_build.id}"
puts " recent_success (100 days ago): #{recent_success.id}"
puts " recent_failed (90 days ago): #{recent_failed.id}"
puts " recent_canceled (80 days ago): #{recent_canceled.id}"
puts " running_build (10 days ago, not finished): #{running_build.id}"
puts " bridge (50 days ago, not Ci::Build): #{bridge.id}"
Step 2: Run the migration
# In terminal, run the migration
bundle exec rake db:migrate
You should see output like:
ci: == [advisory_lock_connection] object_id: 134440, pg_backend_pid: 22794
ci: == 20260204083514 QueueBackfillCiFinishedBuildsToClickHouse: migrating ========
ci: -- execute("SELECT id\nFROM p_ci_builds\nWHERE type = 'Ci::Build'\n AND status = 'success'\n AND created_at >= '2025-08-10T00:00:00Z'\nORDER BY created_at ASC\nLIMIT 1\n")
ci: -> 0.0130s
ci: -- Starting backfill from id: 100814
ci: == 20260204083514 QueueBackfillCiFinishedBuildsToClickHouse: migrated (0.1343s)
ci: == [advisory_lock_connection] object_id: 134440, pg_backend_pid: 22794
Step 3: Verify migration was scheduled
# In rails console
connection = Ci::ApplicationRecord.connection
Gitlab::Database::SharedModel.using_connection(connection) do
migration = Gitlab::Database::BackgroundMigration::BatchedMigration.find_by(
job_class_name: 'BackfillCiFinishedBuildsToClickHouse'
)
puts "Migration scheduled: #{migration.present?}"
puts "Status: #{migration&.status}"
puts "Min value: #{migration&.min_value}"
puts "Max value: #{migration&.max_value}"
end
Step 4: Execute a batch manually (Ideally this is not needed in local as we'd have less number of records)
# In rails console
# verify if the migration is completed already:
migration.status == 3 ? '✓ finished' : '✗ NOT finished'
# Get sync events count before
before_count = Ci::FinishedBuildChSyncEvent.count
puts "Sync events before: #{before_count}"
# Execute one batch
Gitlab::Database::BackgroundMigration::BatchedMigrationRunner.new(connection: Ci::ApplicationRecord.connection).run_migration_job(migration)
# Get sync events count after
after_count = Ci::FinishedBuildChSyncEvent.count
puts "Sync events after: #{after_count}"
puts "New sync events created: #{after_count - before_count}"
Step 5: Verify sync events were created correctly
# In rails console
# Check sync events for builds that SHOULD be synced
[recent_success, recent_failed, recent_canceled].each do |build|
event = Ci::FinishedBuildChSyncEvent.find_by(build_id: build.id)
status = event.present? ? '✓ created' : '✗ NOT created'
puts "Build #{build.id} (#{build.status}): #{status}"
end
# Verify builds that should NOT have sync events
puts "\nBuilds that should be skipped:"
[
[old_build, 'outside 180-day period'],
[running_build, 'not finished'],
[bridge, 'Ci::Bridge not Ci::Build']
].each do |build, reason|
event = Ci::FinishedBuildChSyncEvent.find_by(build_id: build.id)
status = event.present? ? '✗ INCORRECTLY created' : '✓ correctly skipped'
puts "Build #{build.id} (#{reason}): #{status}"
end; nil
Step 6: Verify backfill_in_progress flag
# In rails console
flag_set = ClickHouse::MigrationSupport::CiFinishedBuildsConsistencyHelper.backfill_in_progress?
puts "Backfill in progress flag: #{flag_set}"
Step 7: Cleanup
# In rails console - Clean up test data
pipeline.builds.destroy_all
pipeline.destroy!
project.destroy!
namespace.destroy!
# Remove the batched migration (for re-testing) or run rollback
Gitlab::Database::BackgroundMigration::BatchedMigration.find_by(
job_class_name: 'BackfillCiFinishedBuildsToClickHouse'
)&.destroy!
# Clear the backfill flag
ClickHouse::MigrationSupport::CiFinishedBuildsConsistencyHelper.clear_backfill_in_progress!
Rollback
# In terminal
VERSION=20260204083514 bundle exec rails db:migrate:down:ci
Query Plans:
1. FIND MIN_ID QUERY
Query:
SELECT id
FROM p_ci_builds
WHERE type = 'Ci::Build'
AND status = 'success'
AND created_at >= NOW() - INTERVAL '180 days'
ORDER BY created_at ASC
LIMIT 1;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/48132/commands/144970
2. BATCHING QUERY (find batch end - BATCH_SIZE=1000)
Query:
SELECT id
FROM p_ci_builds
WHERE id >= 10971620999
ORDER BY id ASC
LIMIT 1 OFFSET 1000;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/48132/commands/144971
3. SUB-BATCH BOUNDARY QUERY (SUB_BATCH_SIZE=100)
Query:
SELECT id
FROM p_ci_builds
WHERE id >= 10971620999
AND id < 10971622072
ORDER BY id ASC
LIMIT 1 OFFSET 100;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/48132/commands/144974
4. FILTER AND FETCH SUB-BATCH DATA
Query:
SELECT id, project_id, finished_at
FROM p_ci_builds
WHERE id >= 10971620999
AND id < 10971621126
AND type = 'Ci::Build'
AND status IN ('success', 'failed', 'canceled')
AND finished_at IS NOT NULL;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/48132/commands/144975
5. UPSERT QUERY
Query:
INSERT INTO p_ci_finished_build_ch_sync_events (build_id, project_id, build_finished_at, processed)
VALUES
(1000001, 100, '2025-01-15 10:00:00', false),
(1000002, 100, '2025-01-15 10:00:01', false),
(1000003, 100, '2025-01-15 10:00:02', false),
(1000004, 100, '2025-01-15 10:00:03', false),
(1000005, 100, '2025-01-15 10:00:04', false)
ON CONFLICT (build_id, partition)
DO UPDATE SET
project_id = EXCLUDED.project_id,
build_finished_at = EXCLUDED.build_finished_at,
processed = EXCLUDED.processed
RETURNING build_id, partition;
Query Plan: https://console.postgres.ai/gitlab/gitlab-production-ci/sessions/48132/commands/144976
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.