Encoding::UndefinedConversionError in highlight_cache.rb
<!--- Please read this! Before opening a new issue, make sure to search for keywords in the issues filtered by the "regression" or "bug" label: - https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=regression - https://gitlab.com/gitlab-org/gitlab/issues?label_name%5B%5D=bug and verify the issue you're about to submit isn't a duplicate. ---> ### Summary DiffFile lines are ASCII 8BIT encoded and can fail to convert to UTF-8 if there's an unsupported character: Encoding::UndefinedConversionError ("\xE9" from ASCII-8BIT to UTF-8) MR !19917 implemented redis cache and part of the cache process is to call `to_json` on the diff highlight. This causes the conversion error to get thrown. ```ruby def write_to_redis_hash(hash) Gitlab::Redis::Cache.with do |redis| redis.pipelined do hash.each do |diff_file_id, highlighted_diff_lines_hash| redis.hset(key, diff_file_id, highlighted_diff_lines_hash.to_json) end ``` ### Steps to reproduce create a git diff that causes a line with an unsupported UTF-8 character to get highlighted and submit it as a MR. Loading the MR diff view should error ### Example Project Only have one internally ### What is the current *bug* behavior? Merge request diff view fails to load. ### What is the expected *correct* behavior? Merge request diff view should load. ### Relevant logs and/or screenshots ``` Processing by Projects::MergeRequests::DiffsController#show as JSON Parameters: {"w"=>"0", "namespace_id"=>"<snip>", "project_id"=>"<snip>", "id"=>"1"} Completed 500 Internal Server Error in 1986ms (ActiveRecord: 22.9ms | Elasticsearch: 0.0ms) Encoding::UndefinedConversionError ("\xE9" from ASCII-8BIT to UTF-8): lib/gitlab/diff/highlight_cache.rb:94:in `block (3 levels) in write_to_redis_hash' lib/gitlab/diff/highlight_cache.rb:93:in `each' lib/gitlab/diff/highlight_cache.rb:93:in `block (2 levels) in write_to_redis_hash' lib/gitlab/diff/highlight_cache.rb:92:in `block in write_to_redis_hash' lib/gitlab/redis/wrapper.rb:19:in `block in with' lib/gitlab/redis/wrapper.rb:19:in `with' lib/gitlab/diff/highlight_cache.rb:91:in `write_to_redis_hash' lib/gitlab/diff/highlight_cache.rb:43:in `write_if_empty' lib/gitlab/diff/file_collection/merge_request_diff_base.rb:31:in `write_cache' app/controllers/projects/merge_requests/diffs_controller.rb:57:in `render_diffs' app/controllers/projects/merge_requests/diffs_controller.rb:13:in `show' lib/gitlab/session.rb:11:in `with_session' app/controllers/application_controller.rb:467:in `set_session_storage' lib/gitlab/i18n.rb:55:in `with_locale' lib/gitlab/i18n.rb:61:in `with_user_locale' app/controllers/application_controller.rb:461:in `set_locale' lib/gitlab/application_context.rb:18:in `with_context' app/controllers/application_controller.rb:453:in `set_current_context' lib/gitlab/error_tracking.rb:34:in `with_context' app/controllers/application_controller.rb:545:in `sentry_context' lib/gitlab/middleware/rails_queue_duration.rb:27:in `call' lib/gitlab/metrics/rack_middleware.rb:17:in `block in call' lib/gitlab/metrics/transaction.rb:62:in `run' lib/gitlab/metrics/rack_middleware.rb:17:in `call' lib/gitlab/request_profiler/middleware.rb:17:in `call' lib/gitlab/middleware/go.rb:20:in `call' lib/gitlab/etag_caching/middleware.rb:13:in `call' lib/gitlab/middleware/multipart.rb:117:in `call' lib/gitlab/middleware/read_only/controller.rb:52:in `call' lib/gitlab/middleware/read_only.rb:18:in `call' lib/gitlab/middleware/basic_health_check.rb:25:in `call' lib/gitlab/middleware/request_context.rb:23:in `call' config/initializers/fix_local_cache_middleware.rb:9:in `call' lib/gitlab/metrics/requests_rack_middleware.rb:49:in `call' lib/gitlab/middleware/release_env.rb:12:in `call' #<Gitlab::Diff::Line:0x00007f057161a878 @index=1, @type=nil, @text=" \x00/\x00/\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00#\x00l\x00a\x00n\x00g\x00u\x00a\x00g\x00e\x00 \x00f\x00r\x00-\x00F\x00R\x00 \x00 \x00\"\x00l\x00'\x00o\x00p\x00t\x00i\x00o\x00n\x00 \x00d\x00e\x00 \x00b\x00o\x00t\x00t\x00e\x00 \x00d\x00e\x00 \x00D\x00\xE9\x00b\x00u\x00t\x00\"\x00\r\x00", @new_pos=70, @old_pos=70, @parent_file=#<Gitlab::Diff::File:0x00007f0591b6f588 ...>, @rich_text=" <span id=\"LC70\" class=\"line\" lang=\"plaintext\">// #language fr-FR \"l'option de botte de Début\"</span>\n", @line_code="1e622865eb43ead29516428ad50ae36cf1102f95_70_70">, ``` ### Possible fixes The JSON conversion is necessary, so one idea is to just encode the data stored in the hash prior to cache insertion and unencode it after it's pulled out
issue