Create background migration to copy builds_metadata records to new tables

What does this MR do and why?

This MR creates a background migration to migrate CI builds metadata from the existing p_ci_builds_metadata table into new deduplicated tables as part of the CI data normalization effort.

What it does:

  1. Creates a batched background migration (MoveCiBuildsMetadata) that processes p_ci_builds_metadata records in batches across all partitions

  2. Migrates data to multiple target tables:

    • Creates job definitions in p_ci_job_definitions with deduplicated configuration data
    • Creates job definition instances in p_ci_job_definition_instances to link jobs to their definitions
    • Updates p_ci_builds with metadata fields (timeout, exit_code, debug_trace_enabled, etc.)
    • Updates p_ci_job_artifacts with artifact configuration (exposed_as, exposed_paths)
    • Copies environment data to job_environments table
  3. Handles data deduplication by computing checksums of job configurations to avoid storing duplicate definitions

  4. Runs only on .com, it should be reintroduced in 18.7 for self-managed after we validate that's working as intended on .com

Why this is needed:

  • Database normalization: The current p_ci_builds_metadata table stores redundant configuration data that can be deduplicated
  • Performance improvement: Separating job definitions from instances reduces storage overhead and improves query performance
  • Scalability: The new structure better supports GitLab's growing CI workload by reducing data duplication
  • Data integrity: Centralizes job configuration management and reduces inconsistencies

Migration approach:

  • Uses partition-aware batching to handle the large partitioned p_ci_builds_metadata table efficiently
  • Processes each partition separately with configurable batch sizes (1000 records per batch, 100 per sub-batch)
  • Includes comprehensive data validation and handles edge cases like missing tags or run steps
  • Uses INSERT ... ON CONFLICT DO NOTHING for safe concurrent execution

This migration is part of the broader CI data architecture improvements tracked in issue #552069 (closed).

Changelog: other

References

Related to #552069 (closed)

Screenshots or screen recordings

Before After

How to set up and validate locally

  1. Ensure you have CI builds metadata in your local database
  2. Run the migration: rails db:migrate
  3. Check that the background migration is queued: Gitlab::Database::BackgroundMigration::BatchedMigration.where(job_class_name: 'MoveCiBuildsMetadata')
  4. Monitor migration progress in the admin area under Background Migrations

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Marius Bobin

Merge request reports

Loading