Fix Encoding::CompatibilityError in YAML loader for binary input

What does this MR do and why?

Reintroduces the UTF-8 BOM stripping behavior in lib/gitlab/config/loader/yaml.rb behind the ci_yaml_loader_strip_bom feature flag, with a guard that only runs strip_bom when the input is valid UTF-8.

Background

The previous attempt to add BOM stripping (!235243 (merged)) called strip_bom unconditionally. The helper compares the input against the UTF-8 BOM sequence, which raises Encoding::CompatibilityError when the input is ASCII-8BIT and contains non-ASCII bytes.

HTTParty returns response bodies as ASCII-8BIT when the upstream Content-Type is binary (such as binary/octet-stream). The exception escaped the rescue Psych::Exception / rescue ArgumentError blocks in the YAML loader and was caught later by the generic rescue StandardError in lib/gitlab/ci/pipeline/chain/config/process.rb, producing the masked Undefined error (correlation_id) reported in request-for-help#4756. That MR has since been reverted.

Approach

  • Only call strip_bom when the input is valid UTF-8. Non-UTF-8 files cannot contain a UTF-8 BOM anyway, so skipping the call for them is correct and avoids both Encoding::CompatibilityError and any mojibake risk from forced encoding.
  • Gated behind the ci_yaml_loader_strip_bom feature flag (gitlab_com_derisk, default_enabled: false) for safe rollout.
  • Spec coverage for:
    • UTF-8 input with a BOM (flag enabled): BOM stripped, parses correctly.
    • Customer scenario — ASCII-8BIT with valid UTF-8 byte sequences (e.g., binary/octet-stream remote include): no encoding error.
    • Exotic encodings (Windows-1252, ISO-8859-1, Shift_JIS, ASCII-8BIT with non-UTF-8 high bytes): no Encoding::CompatibilityError and no silent mojibake.

References

How to set up and validate locally

  1. Find or set up a remote URL that serves a non-empty YAML file with non-ASCII bytes and a non-UTF-8 Content-Type (e.g., binary/octet-stream). Example used during investigation: https://<example>.gitlab-ci.yml.
  2. In a project, create .gitlab-ci.yml:
    include:
      - remote: 'https://<example>.gitlab-ci.yml'
  3. Trigger ciLint (GraphQL mutation ciLint or the Pipeline Editor) with the feature flag enabled.
  4. Validation should succeed (or return a meaningful validation error) instead of Undefined error (correlation_id).

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist.

Edited by Rajendra Kadam

Merge request reports

Loading