Skip to content

add parsing support for incoming html email

jneen requested to merge bugfix/html-only-mail into master

What does this MR do?

Fixes #18388 (closed) by adding support for parsing HTML email

Are there points in the code the reviewer needs to double check?

The new class, Gitlab::Email::HTMLParser, which needs to translate the HTML content to text and also delete replies, as they are not necessarily in the correct format to be caught by EmailReplyParser. The solution I found that should work for any HTML-formatted email is to remove all <table> and <blockquote> tags. Actual <table> elements (to be interpreted by markdown) should already be encoded with e.g. &lt;table&gt; - the only failure mode is if there is an actual HTML table in the content itself, which we wouldn't be able to support easily anyways.

The gem html2text traverses the HTML tree and outputs text - and markdown in the case of HTML links or images.

Merge request reports