Backfill `ci_finished_builds` with `stage_name` and other required fields
## Summary - Backfill the `ci_finished_builds` ClickHouse table with the `stage_name` and other mentioned columns in the scope for all records from the past 180 days (6 months). - This will enable users to analyze historical job metrics grouped by stage immediately upon feature release, rather than waiting for data to accumulate. ## Context The `stage_name` column has been added to the `ci_finished_builds` ClickHouse table and is now being synced for new builds. However, historical records (prior to the sync implementation) lack this field, creating a data gap that would negatively impact user experience and dashboard adoption. ## Scope ### Columns Included - `stage_name` - `namespace_path` - `group_name` - `failure_reason` - `manual` - `allow_failure` - `user_id` - `artifact_filename` - `artifact_size` - `retries_count` - `runner_tags` - `job_definition_id` ### Columns Excluded (Deferred) - `tags` - Excluded to reduce complexity and ensure backfill can complete within the milestone timeline. May be considered in a future iteration. ## Rollout Strategy ### GitLab.com - Target: Start running migration in 18.9 - The backfill will be initiated on .com and run until completion - Estimated duration: TBD (will be determined via database-testing CI job) ### Self-Managed - Migration to be added by 18.9 at the latest (before 18.11 required stop) - Finalization in 19.0 after .com backfill completes - Reference: [Batched Background Migrations documentation](https://docs.gitlab.com/development/database/batched_background_migrations/#finalize-a-batched-background-migration) ### Feature Enablement The "group by stage name" feature will be enabled **only after the batched background migration completes**: - **GitLab.com**: Feature flag enabled once migration finishes - **Self-managed**: Follows the standard finalization process at a required stop This approach ensures users don't see incomplete data for older jobs. _Note: FF `group by stage name` yet to be implemented._ ## Implementation Approach (Draft) Use `BatchedBackgroundMigration` to backfill `stage_name` for all `ci_finished_builds` records with finished_at in the last 180 days. The migration should: 1. Join `ci_builds` with `ci_stages` to fetch the stage_name 2. Update records using the `ReplacingMergeTree` versioning mechanism (version, deleted columns) 3. Run on a replica to minimize impact on production Postgres ## Related Issues and MRs * Parent: #580441 - Sync stage_name to ClickHouse for job analytics grouping * Related: #464713 - Backfill root_namespace_id in ci_finished_builds * Related: !217156+ * Related: !217043+ * Related: !216825+ * DevAnalytics Observer (CH importer): https://gitlab.com/gitlab-org/quality/observer/-/blob/main/app/services/transformers/build_event.rb?ref_type=heads * Observability team OpenTelemetry exporter: https://gitlab.com/gitlab-org/gitlab/blob/608553b90d1fbe443da0c585785c211762cefa83/app/services/ci/observability/export_service.rb#L83-86
issue