Repopulate tags table from ci_runner_taggings after organization migration
Summary
As discussed in #470872 (comment 2382060671), when performing organization migration, the tag_id field in ci_runner_taggings will not be migrated. Instead, it needs to be reconstructed after migration to deduplicate tags cell-locally.
Background
The agreed-upon design for Runner/Builds/Tags includes:
-
ci_runner_taggingswith bothtag_nameandtag_idfields - When OrgMoving, we do not migrate
tag_id- this will be reconstructed later to deduplicate cell-local -
tagsis retained as a cell-local acceleration table -
ci_pending_buildswill upsert tag names intotagsand continue usingtag_ids[] -
/jobs/requestwill useci_runner_taggings.tag_idto matchci_pending_builds.tag_ids[]
Note: For builds, tag information is being deduplicated into the p_ci_job_definitions table (currently undergoing backfill), so this issue focuses specifically on runner tags.
Problem
After organization migration:
- The
ci_runner_taggings_project_typeandci_runner_taggings_group_typetables will havetag_namepopulated buttag_idwill be NULL. There is currently a NOT NULL constraint there (context) - The cell-local
tagstable needs to be repopulated with deduplicated tag entries - The
tag_idreferences inci_runner_taggingsneed to be reconstructed to point to the new cell-localtagsentries
Tasks
-
Determine the approach for repopulating
tagsand reconstructingtag_idreferences (e.g., batch processing, self-healing trigger, or other mechanism) - Design the solution to handle deduplication correctly across the cell
- Implement the chosen approach
- Test with realistic migration scenarios
-
Drop the
NOT NULLconstraint onci_runner_taggings.tag_idif a trigger can't be used to populate the value.
Considerations
Potential approaches mentioned in discussions:
- Self-healing using a database trigger (&18601 (comment 2926182696))
- Batch processing after migration
- Other mechanisms to be explored
Related Issues
Edited by Marius Bobin