Skip to content

Fix UTF-8 conversion in reply parser

What does this MR do and why?

For #364329 (closed)

Screen_Shot_2022-06-04_at_17.50.40 Source: https://log.gprd.gitlab.net/goto/2530c000-e3f4-11ec-aade-19e9974a7229

Encoding::UndefinedConversionError: "\xEF" from ASCII-8BIT to UTF-8
  lib/gitlab/email/reply_parser.rb:36:in `encode'
    encoded_body = body.force_encoding(encoding).encode("UTF-8")
  lib/gitlab/email/reply_parser.rb:36:in `execute'
    encoded_body = body.force_encoding(encoding).encode("UTF-8")
  lib/gitlab/email/handler/reply_processing.rb:47:in `process_message'
    message, stripped_text = ReplyParser.new(mail, **kwargs).execute
  lib/gitlab/email/handler/reply_processing.rb:39:in `message_including_reply_or_only_quotes'
    @message_including_reply_or_only_quotes ||= process_message(trim_reply: false, allow_only_quotes: true)
  lib/gitlab/email/handler/service_desk_handler.rb:143:in `message_including_template'
    description = message_including_reply_or_only_quotes
...
(97 additional frame(s) were not displayed)

This is a classic issue when dealing with emails. Although we already handle UTF-8 problem in https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1725, we use a different method in that issue. In the email parser, we are enforcing UTF-8 with default ruby encoding utility:

        encoded_body = body.force_encoding(encoding).encode("UTF-8")

In the other issue, we use Gitlab::EncodingHelper. This helper provides a better solution to deal with UTF-8. It tries to detect the source encoding before converting. Therefore, it can provide a wider range of conversion. This comment explains this problem better.

The solution is simple. Switching to use Gitlab::EncodingHelper solves the problem.

How to set up and validate locally

I added some tests to cover this fix. Before the patch, the test raise the following exception. Afterward, the test is green:

Screen_Shot_2022-06-05_at_01.19.36

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Quang-Minh Nguyen

Merge request reports