Fix UTF-8 conversion in reply parser
What does this MR do and why?
For #364329 (closed)
Source: https://log.gprd.gitlab.net/goto/2530c000-e3f4-11ec-aade-19e9974a7229
Encoding::UndefinedConversionError: "\xEF" from ASCII-8BIT to UTF-8
lib/gitlab/email/reply_parser.rb:36:in `encode'
encoded_body = body.force_encoding(encoding).encode("UTF-8")
lib/gitlab/email/reply_parser.rb:36:in `execute'
encoded_body = body.force_encoding(encoding).encode("UTF-8")
lib/gitlab/email/handler/reply_processing.rb:47:in `process_message'
message, stripped_text = ReplyParser.new(mail, **kwargs).execute
lib/gitlab/email/handler/reply_processing.rb:39:in `message_including_reply_or_only_quotes'
@message_including_reply_or_only_quotes ||= process_message(trim_reply: false, allow_only_quotes: true)
lib/gitlab/email/handler/service_desk_handler.rb:143:in `message_including_template'
description = message_including_reply_or_only_quotes
...
(97 additional frame(s) were not displayed)
This is a classic issue when dealing with emails. Although we already handle UTF-8 problem in https://gitlab.com/gitlab-com/gl-infra/scalability/-/issues/1725, we use a different method in that issue. In the email parser, we are enforcing UTF-8 with default ruby encoding utility:
encoded_body = body.force_encoding(encoding).encode("UTF-8")
In the other issue, we use Gitlab::EncodingHelper. This helper provides a better solution to deal with UTF-8. It tries to detect the source encoding before converting. Therefore, it can provide a wider range of conversion. This comment explains this problem better.
The solution is simple. Switching to use Gitlab::EncodingHelper solves the problem.
How to set up and validate locally
I added some tests to cover this fix. Before the patch, the test raise the following exception. Afterward, the test is green:
MR acceptance checklist
This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.
-
I have evaluated the MR acceptance checklist for this MR.