Encoding::UndefinedConversionError in highlight_cache.rb
Summary
DiffFile lines are ASCII 8BIT encoded and can fail to convert to UTF-8 if there's an unsupported character:
Encoding::UndefinedConversionError ("\xE9" from ASCII-8BIT to UTF-8)
MR !19917 (merged) implemented redis cache and part of the cache process is to call to_json
on the diff highlight. This causes the conversion error to get thrown.
def write_to_redis_hash(hash)
Gitlab::Redis::Cache.with do |redis|
redis.pipelined do
hash.each do |diff_file_id, highlighted_diff_lines_hash|
redis.hset(key, diff_file_id, highlighted_diff_lines_hash.to_json)
end
Steps to reproduce
create a git diff that causes a line with an unsupported UTF-8 character to get highlighted and submit it as a MR.
Loading the MR diff view should error
Example Project
Only have one internally
What is the current bug behavior?
Merge request diff view fails to load.
What is the expected correct behavior?
Merge request diff view should load.
Relevant logs and/or screenshots
Processing by Projects::MergeRequests::DiffsController#show as JSON
Parameters: {"w"=>"0", "namespace_id"=>"<snip>", "project_id"=>"<snip>", "id"=>"1"}
Completed 500 Internal Server Error in 1986ms (ActiveRecord: 22.9ms | Elasticsearch: 0.0ms)
Encoding::UndefinedConversionError ("\xE9" from ASCII-8BIT to UTF-8):
lib/gitlab/diff/highlight_cache.rb:94:in `block (3 levels) in write_to_redis_hash'
lib/gitlab/diff/highlight_cache.rb:93:in `each'
lib/gitlab/diff/highlight_cache.rb:93:in `block (2 levels) in write_to_redis_hash'
lib/gitlab/diff/highlight_cache.rb:92:in `block in write_to_redis_hash'
lib/gitlab/redis/wrapper.rb:19:in `block in with'
lib/gitlab/redis/wrapper.rb:19:in `with'
lib/gitlab/diff/highlight_cache.rb:91:in `write_to_redis_hash'
lib/gitlab/diff/highlight_cache.rb:43:in `write_if_empty'
lib/gitlab/diff/file_collection/merge_request_diff_base.rb:31:in `write_cache'
app/controllers/projects/merge_requests/diffs_controller.rb:57:in `render_diffs'
app/controllers/projects/merge_requests/diffs_controller.rb:13:in `show'
lib/gitlab/session.rb:11:in `with_session'
app/controllers/application_controller.rb:467:in `set_session_storage'
lib/gitlab/i18n.rb:55:in `with_locale'
lib/gitlab/i18n.rb:61:in `with_user_locale'
app/controllers/application_controller.rb:461:in `set_locale'
lib/gitlab/application_context.rb:18:in `with_context'
app/controllers/application_controller.rb:453:in `set_current_context'
lib/gitlab/error_tracking.rb:34:in `with_context'
app/controllers/application_controller.rb:545:in `sentry_context'
lib/gitlab/middleware/rails_queue_duration.rb:27:in `call'
lib/gitlab/metrics/rack_middleware.rb:17:in `block in call'
lib/gitlab/metrics/transaction.rb:62:in `run'
lib/gitlab/metrics/rack_middleware.rb:17:in `call'
lib/gitlab/request_profiler/middleware.rb:17:in `call'
lib/gitlab/middleware/go.rb:20:in `call'
lib/gitlab/etag_caching/middleware.rb:13:in `call'
lib/gitlab/middleware/multipart.rb:117:in `call'
lib/gitlab/middleware/read_only/controller.rb:52:in `call'
lib/gitlab/middleware/read_only.rb:18:in `call'
lib/gitlab/middleware/basic_health_check.rb:25:in `call'
lib/gitlab/middleware/request_context.rb:23:in `call'
config/initializers/fix_local_cache_middleware.rb:9:in `call'
lib/gitlab/metrics/requests_rack_middleware.rb:49:in `call'
lib/gitlab/middleware/release_env.rb:12:in `call'
#<Gitlab::Diff::Line:0x00007f057161a878 @index=1, @type=nil, @text=" \x00/\x00/\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00#\x00l\x00a\x00n\x00g\x00u\x00a\x00g\x00e\x00 \x00f\x00r\x00-\x00F\x00R\x00 \x00 \x00\"\x00l\x00'\x00o\x00p\x00t\x00i\x00o\x00n\x00 \x00d\x00e\x00 \x00b\x00o\x00t\x00t\x00e\x00 \x00d\x00e\x00 \x00D\x00\xE9\x00b\x00u\x00t\x00\"\x00\r\x00", @new_pos=70, @old_pos=70, @parent_file=#<Gitlab::Diff::File:0x00007f0591b6f588 ...>, @rich_text=" <span id=\"LC70\" class=\"line\" lang=\"plaintext\">// #language fr-FR \"l'option de botte de Début\"</span>\n", @line_code="1e622865eb43ead29516428ad50ae36cf1102f95_70_70">,
Possible fixes
The JSON conversion is necessary, so one idea is to just encode the data stored in the hash prior to cache insertion and unencode it after it's pulled out
Edited by Vincent Fazio