Investigate PG::UntranslatableCharacter errors in MoveCiBuildsMetadata background migration
Problem
The MoveCiBuildsMetadata background migration (IDs: 3000518, 3000519, 3000520) is failing with PG::UntranslatableCharacter errors when attempting to insert data containing null bytes (\u0000) in JSON fields.
Error Details
ActiveRecord::StatementInvalid: PG::UntranslatableCharacter: ERROR: unsupported Unicode escape sequence
DETAIL: \u0000 cannot be converted to text.
CONTEXT: JSON data, line 1: ...
The errors occur when trying to move ci_builds_metadata records that contain null bytes in their options JSON field. PostgreSQL cannot convert null bytes to text when storing in JSON columns.
Affected Records
- Migration ID 3000518 has at least 5 failed job attempts
- Records affected: 3256151, 3877142, 3849312, 3768000, and others
Investigation Needed
- Determine the root cause of null bytes in the
ci_builds_metadata.optionsfield - Identify how many records are affected across all three migrations
- Decide on remediation strategy:
- Clean null bytes before migration
- Skip affected records
- Handle in the migration code itself
- Implement fix and re-run migrations
Related
Related to epic #18271 (CI storage optimization - Phase 3: remove legacy data)
Workaround
These error are caused by really old jobs since they use the legacy format, on .com we changed the storage format 5 years ago. The pipeline archival feature can be used to skip migrating this old data: https://docs.gitlab.com/update/versions/gitlab_18_changes/#controlling-the-scope-for-jobs-processing-data
On the admin page, set the archival value to something that's less tan 5 years, say 4 years and execute the migration using the cli: https://docs.gitlab.com/update/background_migrations/#execute-a-migration