Create background migration to copy builds_metadata records to new tables
What does this MR do and why?
This MR creates a background migration to migrate CI builds metadata from the existing p_ci_builds_metadata table into new deduplicated tables as part of the CI data normalization effort.
What it does:
-
Creates a batched background migration (
MoveCiBuildsMetadata) that processesp_ci_builds_metadatarecords in batches across all partitions -
Migrates data to multiple target tables:
- Creates job definitions in
p_ci_job_definitionswith deduplicated configuration data - Creates job definition instances in
p_ci_job_definition_instancesto link jobs to their definitions - Updates
p_ci_buildswith metadata fields (timeout, exit_code, debug_trace_enabled, etc.) - Updates
p_ci_job_artifactswith artifact configuration (exposed_as, exposed_paths) - Copies environment data to
job_environmentstable
- Creates job definitions in
-
Handles data deduplication by computing checksums of job configurations to avoid storing duplicate definitions
-
Runs only on .com, it should be reintroduced in 18.7 for self-managed after we validate that's working as intended on .com
Why this is needed:
-
Database normalization: The current
p_ci_builds_metadatatable stores redundant configuration data that can be deduplicated - Performance improvement: Separating job definitions from instances reduces storage overhead and improves query performance
- Scalability: The new structure better supports GitLab's growing CI workload by reducing data duplication
- Data integrity: Centralizes job configuration management and reduces inconsistencies
Migration approach:
- Uses partition-aware batching to handle the large partitioned
p_ci_builds_metadatatable efficiently - Processes each partition separately with configurable batch sizes (1000 records per batch, 100 per sub-batch)
- Includes comprehensive data validation and handles edge cases like missing tags or run steps
- Uses
INSERT ... ON CONFLICT DO NOTHINGfor safe concurrent execution
This migration is part of the broader CI data architecture improvements tracked in issue #552069 (closed).
Changelog: other
References
Related to #552069 (closed)
Screenshots or screen recordings
| Before | After |
|---|---|
How to set up and validate locally
- Ensure you have CI builds metadata in your local database
- Run the migration:
rails db:migrate - Check that the background migration is queued:
Gitlab::Database::BackgroundMigration::BatchedMigration.where(job_class_name: 'MoveCiBuildsMetadata') - Monitor migration progress in the admin area under Background Migrations
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.