Encoding::CompatibilityError in Rouge lexer when highlighting diffs with ASCII-8BIT content
## Summary
An `Encoding::CompatibilityError` occurs when viewing commits or merge requests containing files with ASCII-8BIT encoded content (such as PDFs diffed as text). The error is triggered when Rouge's lexer guesser attempts to match UTF-8 regular expressions against ASCII-8BIT strings.
**Sentry Error**: https://new-sentry.gitlab.net/organizations/gitlab/issues/3179853
## Error Details
```
Encoding::CompatibilityError: incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)
from rouge/guessers/util.rb:13:in `sub'
from rouge/guessers/modeline.rb:32:in `filter'
from rouge/lexer.rb:185:in `guess'
from lib/gitlab/highlight.rb:39:in `lexer'
```
## Root Cause
1. Rapid Diffs calls `whitespace_only?` to determine rendering, which triggers syntax highlighting
2. `Gitlab::Highlight#lexer` calls `Rouge::Lexer.guess(source: @blob_content)`
3. `@blob_content` may be ASCII-8BIT encoded from Gitaly for files with binary-like content
4. Rouge's modeline guesser uses UTF-8 regexps, causing the encoding error
### Example
Commit with PDF metadata changes: https://gitlab.com/pawel-kow/documentation/-/commit/0180f15cb14f44b0217c308bba91f9c6af0349e2
## Proposed Fix
In `lib/gitlab/highlight.rb`, encode content to UTF-8 before passing to Rouge:
```ruby
def lexer
@lexer ||= custom_language || begin
source = @blob_content.to_s.dup.force_encoding(Encoding::UTF_8)
source = source.encode(Encoding::UTF_8, invalid: :replace, undef: :replace) unless source.valid_encoding?
Rouge::Lexer.guess(filename: @blob_name, source: source).new
rescue Rouge::Guesser::Ambiguous => e
e.alternatives.min_by(&:tag)
end
end
```
issue