Revamp reply emails parsing
This is an attempt to summarize all the issues with reply emails parsing, making it easier to find clues.
All related issues
- HTML emails gitlab-ce#2847, gitlab-ce#3357, gitlab-ce#15545, gitlab-ce#18388, gitlab-ce#23340
- Inline/bottom replies support gitlab-ce#3020, gitlab-ce#14805, gitlab-ce#20514
- Strip signatures gitlab-ce#3061, gitlab-ce#14786
- Ignore auto-generated emails gitlab-ce#18548
- Incident: https://gitlab.com/gitlab-com/infrastructure/issues/1#note_17599430 , gitlab-ce#24003
- Incident: https://0xacab.org/riseup/0xacab/issues/11
Challenges
- Different email clients (e.g. gitlab-ce#18388)
- Different languages
- HTML emails
- Auto-generated emails
- Signatures
Suggested solutions
- We leave markers which we could recognize later in the emails (I think Discourse is doing this, also a ton of support tickets system)
- Have a list of different formats email clients could be using (some clients would use
|
for quoting) - Don't use Markdown, just plaintext (GitHub is doing this, but this could still be very terrible. Here's an example of woes)
Reference implementation
- https://github.com/github/email_reply_parser
- https://github.com/discourse/email_reply_trimmer
- https://github.com/discourse/discourse/commits/master/lib/email/receiver.rb
Some stopped effort
- https://gitlab.com/gitlab-org/gitlab-ce/commits/adopt-email_reply_trimmer (failed build: https://gitlab.com/gitlab-org/gitlab-ce/pipelines/3869613)
Edited by 🤖 GitLab Bot 🤖