Encoding::UndefinedConversionError in highlight_cache.rb
<!---
Please read this!
Before opening a new issue, make sure to search for keywords in the issues
filtered by the "regression" or "bug" label:
- https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=regression
- https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=bug
and verify the issue you're about to submit isn't a duplicate.
--->
### Summary
DiffFile lines are ASCII 8BIT encoded and can fail to convert to UTF-8 if there's an unsupported character:
Encoding::UndefinedConversionError ("\xE9" from ASCII-8BIT to UTF-8)
MR !19917 implemented redis cache and part of the cache process is to call `to_json` on the diff highlight. This causes the conversion error to get thrown.
```ruby
def write_to_redis_hash(hash)
Gitlab::Redis::Cache.with do |redis|
redis.pipelined do
hash.each do |diff_file_id, highlighted_diff_lines_hash|
redis.hset(key, diff_file_id, highlighted_diff_lines_hash.to_json)
end
```
### Steps to reproduce
create a git diff that causes a line with an unsupported UTF-8 character to get highlighted and submit it as a MR.
Loading the MR diff view should error
### Example Project
Only have one internally
### What is the current *bug* behavior?
Merge request diff view fails to load.
### What is the expected *correct* behavior?
Merge request diff view should load.
### Relevant logs and/or screenshots
```
Processing by Projects::MergeRequests::DiffsController#show as JSON
Parameters: {"w"=>"0", "namespace_id"=>"<snip>", "project_id"=>"<snip>", "id"=>"1"}
Completed 500 Internal Server Error in 1986ms (ActiveRecord: 22.9ms | Elasticsearch: 0.0ms)
Encoding::UndefinedConversionError ("\xE9" from ASCII-8BIT to UTF-8):
lib/gitlab/diff/highlight_cache.rb:94:in `block (3 levels) in write_to_redis_hash'
lib/gitlab/diff/highlight_cache.rb:93:in `each'
lib/gitlab/diff/highlight_cache.rb:93:in `block (2 levels) in write_to_redis_hash'
lib/gitlab/diff/highlight_cache.rb:92:in `block in write_to_redis_hash'
lib/gitlab/redis/wrapper.rb:19:in `block in with'
lib/gitlab/redis/wrapper.rb:19:in `with'
lib/gitlab/diff/highlight_cache.rb:91:in `write_to_redis_hash'
lib/gitlab/diff/highlight_cache.rb:43:in `write_if_empty'
lib/gitlab/diff/file_collection/merge_request_diff_base.rb:31:in `write_cache'
app/controllers/projects/merge_requests/diffs_controller.rb:57:in `render_diffs'
app/controllers/projects/merge_requests/diffs_controller.rb:13:in `show'
lib/gitlab/session.rb:11:in `with_session'
app/controllers/application_controller.rb:467:in `set_session_storage'
lib/gitlab/i18n.rb:55:in `with_locale'
lib/gitlab/i18n.rb:61:in `with_user_locale'
app/controllers/application_controller.rb:461:in `set_locale'
lib/gitlab/application_context.rb:18:in `with_context'
app/controllers/application_controller.rb:453:in `set_current_context'
lib/gitlab/error_tracking.rb:34:in `with_context'
app/controllers/application_controller.rb:545:in `sentry_context'
lib/gitlab/middleware/rails_queue_duration.rb:27:in `call'
lib/gitlab/metrics/rack_middleware.rb:17:in `block in call'
lib/gitlab/metrics/transaction.rb:62:in `run'
lib/gitlab/metrics/rack_middleware.rb:17:in `call'
lib/gitlab/request_profiler/middleware.rb:17:in `call'
lib/gitlab/middleware/go.rb:20:in `call'
lib/gitlab/etag_caching/middleware.rb:13:in `call'
lib/gitlab/middleware/multipart.rb:117:in `call'
lib/gitlab/middleware/read_only/controller.rb:52:in `call'
lib/gitlab/middleware/read_only.rb:18:in `call'
lib/gitlab/middleware/basic_health_check.rb:25:in `call'
lib/gitlab/middleware/request_context.rb:23:in `call'
config/initializers/fix_local_cache_middleware.rb:9:in `call'
lib/gitlab/metrics/requests_rack_middleware.rb:49:in `call'
lib/gitlab/middleware/release_env.rb:12:in `call'
#<Gitlab::Diff::Line:0x00007f057161a878 @index=1, @type=nil, @text=" \x00/\x00/\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00#\x00l\x00a\x00n\x00g\x00u\x00a\x00g\x00e\x00 \x00f\x00r\x00-\x00F\x00R\x00 \x00 \x00\"\x00l\x00'\x00o\x00p\x00t\x00i\x00o\x00n\x00 \x00d\x00e\x00 \x00b\x00o\x00t\x00t\x00e\x00 \x00d\x00e\x00 \x00D\x00\xE9\x00b\x00u\x00t\x00\"\x00\r\x00", @new_pos=70, @old_pos=70, @parent_file=#<Gitlab::Diff::File:0x00007f0591b6f588 ...>, @rich_text=" <span id=\"LC70\" class=\"line\" lang=\"plaintext\">// #language fr-FR \"l'option de botte de Début\"</span>\n", @line_code="1e622865eb43ead29516428ad50ae36cf1102f95_70_70">,
```
### Possible fixes
The JSON conversion is necessary, so one idea is to just encode the data stored in the hash prior to cache insertion and unencode it after it's pulled out
issue