Encoding::CompatibilityError in Rouge lexer when highlighting diffs with ASCII-8BIT content
## Summary An `Encoding::CompatibilityError` occurs when viewing commits or merge requests containing files with ASCII-8BIT encoded content (such as PDFs diffed as text). The error is triggered when Rouge's lexer guesser attempts to match UTF-8 regular expressions against ASCII-8BIT strings. **Sentry Error**: https://new-sentry.gitlab.net/organizations/gitlab/issues/3179853 ## Error Details ``` Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string) from rouge/guessers/util.rb:13:in `sub' from rouge/guessers/modeline.rb:32:in `filter' from rouge/lexer.rb:185:in `guess' from lib/gitlab/highlight.rb:39:in `lexer' ``` ## Root Cause 1. Rapid Diffs calls `whitespace_only?` to determine rendering, which triggers syntax highlighting 2. `Gitlab::Highlight#lexer` calls `Rouge::Lexer.guess(source: @blob_content)` 3. `@blob_content` may be ASCII-8BIT encoded from Gitaly for files with binary-like content 4. Rouge's modeline guesser uses UTF-8 regexps, causing the encoding error ### Example Commit with PDF metadata changes: https://gitlab.com/pawel-kow/documentation/-/commit/0180f15cb14f44b0217c308bba91f9c6af0349e2 ## Proposed Fix In `lib/gitlab/highlight.rb`, encode content to UTF-8 before passing to Rouge: ```ruby def lexer @lexer ||= custom_language || begin source = @blob_content.to_s.dup.force_encoding(Encoding::UTF_8) source = source.encode(Encoding::UTF_8, invalid: :replace, undef: :replace) unless source.valid_encoding? Rouge::Lexer.guess(filename: @blob_name, source: source).new rescue Rouge::Guesser::Ambiguous => e e.alternatives.min_by(&:tag) end end ```
issue