Skip to content

Add backfill migration to populate organization_id to all existing records

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Overview

Create and execute a background migration to backfill the organization_id column for all existing records in the keys table.

Context

The organization_id column was added to the keys table in !208105 (merged), but existing records have NULL values. Before we can add a NOT NULL constraint (#577246), we need to populate this field for all existing SSH keys and deploy keys.

Requirements

1. Create Background Migration

Create a batched background migration that:

  • Iterates through all records in the keys table where organization_id IS NULL
  • For each key, derives the organization_id from the associated user or project
  • Updates the record with the correct organization_id
  • Handles errors gracefully and logs any issues

2. Derive organization_id Logic

For SSH keys:

  • Get organization_id from keys.user_id -> users.organization_id
  • Handle cases where user might not have an organization

For deploy keys:

  • Get organization_id from the associated project(s)
  • For shared deploy keys (multiple projects), use the organization from the first project
  • Handle edge cases appropriately

3. Migration Strategy

  • Use Gitlab::Database::BackgroundMigration::BatchedMigration
  • Process records in batches (e.g., 1000-5000 records per batch)
  • Set appropriate batch size and interval to avoid database load
  • Monitor migration progress and performance
  • Plan for potential rollback if issues arise

4. Handle Edge Cases

  • Keys with no associated user (orphaned records)
  • Deploy keys associated with multiple projects in different organizations
  • Keys where the user or project no longer exists
  • Document how each edge case is handled

5. Validation and Monitoring

  • Add monitoring to track migration progress
  • Validate data integrity after backfill
  • Check for any records that couldn't be backfilled
  • Generate report of any problematic records

Implementation Steps

  1. Create the background migration class
  2. Add migration file to queue the background migration
  3. Test the migration on staging environment
  4. Monitor the migration execution on GitLab.com
  5. Validate results and handle any edge cases
  6. Document any records that couldn't be backfilled

Acceptance Criteria

  • Background migration created and tested
  • Migration queued and running on production
  • 100% of records have organization_id populated (or documented exceptions)
  • Data integrity validated
  • Edge cases handled and documented
  • Migration performance is acceptable (no production impact)
  • Monitoring in place to track progress
  • Report generated for any problematic records

Performance Considerations

  • Batch size should be tuned to avoid database load
  • Consider running during off-peak hours
  • Monitor query performance and adjust if needed
  • Estimated time to completion should be calculated

References

Documentation

Edited by 🤖 GitLab Bot 🤖