Backfill ci_finished_builds with stage_name and other required fields

Summary

  • Backfill the ci_finished_builds ClickHouse table with the stage_name and other mentioned columns in the scope for all records from the past 180 days (6 months).
  • This will enable users to analyze historical job metrics grouped by stage immediately upon feature release, rather than waiting for data to accumulate.

Context

The stage_name column has been added to the ci_finished_builds ClickHouse table and is now being synced for new builds. However, historical records (prior to the sync implementation) lack this field, creating a data gap that would negatively impact user experience and dashboard adoption.

Scope

Columns Included

  • stage_name
  • namespace_path
  • group_name
  • failure_reason
  • manual
  • allow_failure
  • user_id
  • artifact_filename
  • artifact_size
  • retries_count
  • runner_tags
  • job_definition_id

Columns Excluded (Deferred)

  • tags - Excluded to reduce complexity and ensure backfill can complete within the milestone timeline. May be considered in a future iteration.

Rollout Strategy

GitLab.com

  • Target: Start running migration in 18.9
  • The backfill will be initiated on .com and run until completion
  • Estimated duration: TBD (will be determined via database-testing CI job)

Self-Managed

  • Migration to be added by 18.9 at the latest (before 18.11 required stop)
  • Finalization in 19.0 after .com backfill completes
  • Reference: Batched Background Migrations documentation

Feature Enablement

The "group by stage name" feature will be enabled only after the batched background migration completes:

  • GitLab.com: Feature flag enabled once migration finishes
  • Self-managed: Follows the standard finalization process at a required stop

This approach ensures users don't see incomplete data for older jobs.

Note: FF group by stage name yet to be implemented.

Implementation Approach (Draft)

Use BatchedBackgroundMigration to backfill stage_name for all ci_finished_builds records with finished_at in the last 180 days. The migration should:

  1. Join ci_builds with ci_stages to fetch the stage_name
  2. Update records using the ReplacingMergeTree versioning mechanism (version, deleted columns)
  3. Run on a replica to minimize impact on production Postgres

Related Issues and MRs

  • Parent: #580441 (closed) - Sync stage_name to ClickHouse for job analytics grouping
  • Related: #464713 - Backfill root_namespace_id in ci_finished_builds
  • Related: Support `stage_name` in `CiJobAnalytics` GraphQ... (!217156 - merged)
  • Related: Sync `stage_name` to `ci_finished_builds` click... (!217043 - merged)
  • Related: Add `stage_name` to `ci_finished_builds` ClickH... (!216825 - merged)
  • DevAnalytics Observer (CH importer): https://gitlab.com/gitlab-org/quality/observer/-/blob/main/app/services/transformers/build_event.rb?ref_type=heads
  • Observability team OpenTelemetry exporter: https://gitlab.com/gitlab-org/gitlab/blob/608553b90d1fbe443da0c585785c211762cefa83/app/services/ci/observability/export_service.rb#L83-86
Edited Jan 29, 2026 by Narendran
Assignee Loading
Time tracking Loading