Skip to content

Encoding::UndefinedConversionError in highlight_cache.rb

Summary

DiffFile lines are ASCII 8BIT encoded and can fail to convert to UTF-8 if there's an unsupported character:

Encoding::UndefinedConversionError ("\xE9" from ASCII-8BIT to UTF-8)

MR !19917 (merged) implemented redis cache and part of the cache process is to call to_json on the diff highlight. This causes the conversion error to get thrown.

      def write_to_redis_hash(hash)
        Gitlab::Redis::Cache.with do |redis|
          redis.pipelined do
            hash.each do |diff_file_id, highlighted_diff_lines_hash|
              redis.hset(key, diff_file_id, highlighted_diff_lines_hash.to_json)
            end

Steps to reproduce

create a git diff that causes a line with an unsupported UTF-8 character to get highlighted and submit it as a MR.

Loading the MR diff view should error

Example Project

Only have one internally

What is the current bug behavior?

Merge request diff view fails to load.

What is the expected correct behavior?

Merge request diff view should load.

Relevant logs and/or screenshots

Processing by Projects::MergeRequests::DiffsController#show as JSON
  Parameters: {"w"=>"0", "namespace_id"=>"<snip>", "project_id"=>"<snip>", "id"=>"1"}
Completed 500 Internal Server Error in 1986ms (ActiveRecord: 22.9ms | Elasticsearch: 0.0ms)

Encoding::UndefinedConversionError ("\xE9" from ASCII-8BIT to UTF-8):

lib/gitlab/diff/highlight_cache.rb:94:in `block (3 levels) in write_to_redis_hash'
lib/gitlab/diff/highlight_cache.rb:93:in `each'
lib/gitlab/diff/highlight_cache.rb:93:in `block (2 levels) in write_to_redis_hash'
lib/gitlab/diff/highlight_cache.rb:92:in `block in write_to_redis_hash'
lib/gitlab/redis/wrapper.rb:19:in `block in with'
lib/gitlab/redis/wrapper.rb:19:in `with'
lib/gitlab/diff/highlight_cache.rb:91:in `write_to_redis_hash'
lib/gitlab/diff/highlight_cache.rb:43:in `write_if_empty'
lib/gitlab/diff/file_collection/merge_request_diff_base.rb:31:in `write_cache'
app/controllers/projects/merge_requests/diffs_controller.rb:57:in `render_diffs'
app/controllers/projects/merge_requests/diffs_controller.rb:13:in `show'
lib/gitlab/session.rb:11:in `with_session'
app/controllers/application_controller.rb:467:in `set_session_storage'
lib/gitlab/i18n.rb:55:in `with_locale'
lib/gitlab/i18n.rb:61:in `with_user_locale'
app/controllers/application_controller.rb:461:in `set_locale'
lib/gitlab/application_context.rb:18:in `with_context'
app/controllers/application_controller.rb:453:in `set_current_context'
lib/gitlab/error_tracking.rb:34:in `with_context'
app/controllers/application_controller.rb:545:in `sentry_context'
lib/gitlab/middleware/rails_queue_duration.rb:27:in `call'
lib/gitlab/metrics/rack_middleware.rb:17:in `block in call'
lib/gitlab/metrics/transaction.rb:62:in `run'
lib/gitlab/metrics/rack_middleware.rb:17:in `call'
lib/gitlab/request_profiler/middleware.rb:17:in `call'
lib/gitlab/middleware/go.rb:20:in `call'
lib/gitlab/etag_caching/middleware.rb:13:in `call'
lib/gitlab/middleware/multipart.rb:117:in `call'
lib/gitlab/middleware/read_only/controller.rb:52:in `call'
lib/gitlab/middleware/read_only.rb:18:in `call'
lib/gitlab/middleware/basic_health_check.rb:25:in `call'
lib/gitlab/middleware/request_context.rb:23:in `call'
config/initializers/fix_local_cache_middleware.rb:9:in `call'
lib/gitlab/metrics/requests_rack_middleware.rb:49:in `call'
lib/gitlab/middleware/release_env.rb:12:in `call'



#<Gitlab::Diff::Line:0x00007f057161a878 @index=1, @type=nil, @text=" \x00/\x00/\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00#\x00l\x00a\x00n\x00g\x00u\x00a\x00g\x00e\x00 \x00f\x00r\x00-\x00F\x00R\x00 \x00 \x00\"\x00l\x00'\x00o\x00p\x00t\x00i\x00o\x00n\x00 \x00d\x00e\x00 \x00b\x00o\x00t\x00t\x00e\x00 \x00d\x00e\x00 \x00D\x00\xE9\x00b\x00u\x00t\x00\"\x00\r\x00", @new_pos=70, @old_pos=70, @parent_file=#<Gitlab::Diff::File:0x00007f0591b6f588 ...>, @rich_text=" <span id=\"LC70\" class=\"line\" lang=\"plaintext\">//                                     #language fr-FR  \"l'option de botte de Début\"</span>\n", @line_code="1e622865eb43ead29516428ad50ae36cf1102f95_70_70">, 

Possible fixes

The JSON conversion is necessary, so one idea is to just encode the data stored in the hash prior to cache insertion and unencode it after it's pulled out

Edited by Vincent Fazio