Repopulate tags table from ci_runner_taggings after organization migration

Summary

As discussed in #470872 (comment 2382060671), when performing organization migration, the tag_id field in ci_runner_taggings will not be migrated. Instead, it needs to be reconstructed after migration to deduplicate tags cell-locally.

Background

The agreed-upon design for Runner/Builds/Tags includes:

  1. ci_runner_taggings with both tag_name and tag_id fields
  2. When OrgMoving, we do not migrate tag_id - this will be reconstructed later to deduplicate cell-local
  3. tags is retained as a cell-local acceleration table
  4. ci_pending_builds will upsert tag names into tags and continue using tag_ids[]
  5. /jobs/request will use ci_runner_taggings.tag_id to match ci_pending_builds.tag_ids[]

Note: For builds, tag information is being deduplicated into the p_ci_job_definitions table (currently undergoing backfill), so this issue focuses specifically on runner tags.

Problem

After organization migration:

  • The ci_runner_taggings_project_type and ci_runner_taggings_group_type tables will have tag_name populated but tag_id will be NULL. There is currently a NOT NULL constraint there (context)
  • The cell-local tags table needs to be repopulated with deduplicated tag entries
  • The tag_id references in ci_runner_taggings need to be reconstructed to point to the new cell-local tags entries

Tasks

  • Determine the approach for repopulating tags and reconstructing tag_id references (e.g., batch processing, self-healing trigger, or other mechanism)
  • Design the solution to handle deduplication correctly across the cell
  • Implement the chosen approach
  • Test with realistic migration scenarios
  • Drop the NOT NULL constraint on ci_runner_taggings.tag_id if a trigger can't be used to populate the value.

Considerations

Potential approaches mentioned in discussions:

Edited by Marius Bobin