Fix Encoding::CompatibilityError in YAML loader for binary input
What does this MR do and why?
Reintroduces the UTF-8 BOM stripping behavior in lib/gitlab/config/loader/yaml.rb behind the ci_yaml_loader_strip_bom feature flag, with a guard that only runs strip_bom when the input is valid UTF-8.
Background
The previous attempt to add BOM stripping (!235243 (merged)) called strip_bom unconditionally. The helper compares the input against the UTF-8 BOM sequence, which raises Encoding::CompatibilityError when the input is ASCII-8BIT and contains non-ASCII bytes.
HTTParty returns response bodies as ASCII-8BIT when the upstream Content-Type is binary (such as binary/octet-stream). The exception escaped the rescue Psych::Exception / rescue ArgumentError blocks in the YAML loader and was caught later by the generic rescue StandardError in lib/gitlab/ci/pipeline/chain/config/process.rb, producing the masked Undefined error (correlation_id) reported in request-for-help#4756. That MR has since been reverted.
Approach
- Only call
strip_bomwhen the input is valid UTF-8. Non-UTF-8 files cannot contain a UTF-8 BOM anyway, so skipping the call for them is correct and avoids bothEncoding::CompatibilityErrorand any mojibake risk from forced encoding. - Gated behind the
ci_yaml_loader_strip_bomfeature flag (gitlab_com_derisk,default_enabled: false) for safe rollout. - Spec coverage for:
- UTF-8 input with a BOM (flag enabled): BOM stripped, parses correctly.
- Customer scenario — ASCII-8BIT with valid UTF-8 byte sequences (e.g.,
binary/octet-streamremote include): no encoding error. - Exotic encodings (Windows-1252, ISO-8859-1, Shift_JIS, ASCII-8BIT with non-UTF-8 high bytes): no
Encoding::CompatibilityErrorand no silent mojibake.
References
- Customer report: https://gitlab.com/gitlab-com/request-for-help/-/issues/4756
- Reverted MR that introduced the regression: !235243 (merged)
- Rollout issue: #600119
How to set up and validate locally
- Find or set up a remote URL that serves a non-empty YAML file with non-ASCII bytes and a non-UTF-8
Content-Type(e.g.,binary/octet-stream). Example used during investigation:https://<example>.gitlab-ci.yml. - In a project, create
.gitlab-ci.yml:include: - remote: 'https://<example>.gitlab-ci.yml' - Trigger ciLint (GraphQL mutation
ciLintor the Pipeline Editor) with the feature flag enabled. - Validation should succeed (or return a meaningful validation error) instead of
Undefined error (correlation_id).
MR acceptance checklist
Evaluate this MR against the MR acceptance checklist.