Add backfill migration to populate organization_id to all existing records
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Overview
Create and execute a background migration to backfill the organization_id column for all existing records in the keys table.
Context
The organization_id column was added to the keys table in !208105 (merged), but existing records have NULL values. Before we can add a NOT NULL constraint (#577246), we need to populate this field for all existing SSH keys and deploy keys.
Requirements
1. Create Background Migration
Create a batched background migration that:
- Iterates through all records in the
keystable whereorganization_id IS NULL - For each key, derives the
organization_idfrom the associated user or project - Updates the record with the correct
organization_id - Handles errors gracefully and logs any issues
2. Derive organization_id Logic
For SSH keys:
- Get
organization_idfromkeys.user_id -> users.organization_id - Handle cases where user might not have an organization
For deploy keys:
- Get
organization_idfrom the associated project(s) - For shared deploy keys (multiple projects), use the organization from the first project
- Handle edge cases appropriately
3. Migration Strategy
- Use
Gitlab::Database::BackgroundMigration::BatchedMigration - Process records in batches (e.g., 1000-5000 records per batch)
- Set appropriate batch size and interval to avoid database load
- Monitor migration progress and performance
- Plan for potential rollback if issues arise
4. Handle Edge Cases
- Keys with no associated user (orphaned records)
- Deploy keys associated with multiple projects in different organizations
- Keys where the user or project no longer exists
- Document how each edge case is handled
5. Validation and Monitoring
- Add monitoring to track migration progress
- Validate data integrity after backfill
- Check for any records that couldn't be backfilled
- Generate report of any problematic records
Implementation Steps
- Create the background migration class
- Add migration file to queue the background migration
- Test the migration on staging environment
- Monitor the migration execution on GitLab.com
- Validate results and handle any edge cases
- Document any records that couldn't be backfilled
Acceptance Criteria
-
Background migration created and tested -
Migration queued and running on production -
100% of records have organization_idpopulated (or documented exceptions) -
Data integrity validated -
Edge cases handled and documented -
Migration performance is acceptable (no production impact) -
Monitoring in place to track progress -
Report generated for any problematic records
Performance Considerations
- Batch size should be tuned to avoid database load
- Consider running during off-peak hours
- Monitor query performance and adjust if needed
- Estimated time to completion should be calculated
References
- Parent Epic: &19679
- Database changes: #577242 (closed) (completed via !208105 (merged))
- Application updates: #577243 (SSH keys), #577244 (deploy keys)
- Depends on: #577243 and #577244 should be completed first
- Blocks: #577246 (NOT NULL constraint)
- Investigation: #553463 (closed)
Documentation
Edited by 🤖 GitLab Bot 🤖