Skip to content

Message canonicalization in the case of UTF-8

Content-Encoding is used to make the message 7-bit clean

This should probably not be recommended for the following reasons:

  1. 8BITMIME is fairly widely supported, there's little practical need
  2. MUAs may choose legacy options themselves, but should not be recommended to do so
  3. New standards should not recommend legacy encodings
  4. Implementations having two separate code branches for UTF-8/8-bit and 7-bit is a source for vulnerabilities.

If anything, base64 encoding the body could be recommended, but a recommendation about Content-Encoding might not be necessary.

It might be safest to forcefully assume 8BITMIME being supported and thus avoid many of these encoding issues altogether. Fundamentally, the spec should try to bring the potential sources of (encoding) confusion as close to zero as possible.

An MTA or any other message relay service that observes a message with Content-Type multipart/mixed that is a single part MUST NOT alter the content of this message body in any way, including, but not limited to, changing the content transfer encoding of the body part or any of its encapsulated body parts.

I'm also unsure if saying that C-E should not be changed would have any point. The ones that change it won't implement the recommendation. The ones that don't change it... don't have to do anything as they likely let both 7bit/8bit pass through just fine.

If any line begins with the string "From ", either the Quoted-Printable or Base64 MIME encoding MUST be applied, and if Quoted-Printable is used, at least one of the characters in the string "From " MUST be encoded

It should be mentioned that this encoding may only be done to the comment section of the address. Preferably with an example in some appendix. There have been implementations out in the wild that have done something else thinking it's okay, but it's not okay. Encoding the local part is NOT standardized or defined.

From: Exämple <exämple@exämple.com> - ✔️

From: =?UTF-8?Q?Ex=C3=A4mple?= <exämple@exämple.com> - ✔️

From: =?UTF-8?Q?Ex=C3=A4mple?= <exämple@xn--exmple-cua.com> - ✔️

From: =?UTF-8?Q?Ex=C3=A4mple_=3Cex=C3=A4mple=40ex=C3=A4mple=2Ecom=3E?= -

From: =?UTF-8?Q?Ex=C3=A4mple?= <=?UTF-8?Q?ex=C3=A4mple?=@exämple.com> -

While the domain can be represented using the A-label, the local part can not. The first example is the nicest to look at and should be preferred, especially considering that all the other valid examples all still need 8BITMIME.

Quoted-Printable or Base64 MIME encoding MUST be applied

This section should also specify that if anything is encoded, it has to be UTF-8 encoded. (See the following section where that's kinda important.)

Everything else, especially outside of UTF-8 ranges, must not be allowed (not just ignored). Not only would this prevent exploiting any hash collisions (signature systems that still use SHA-1 somewhere). It would also prevent any potential mismatch between what is displayed to the user and what is taken as an input to the signature algorithm. These differences can occur both in the case of poor string handling or UI elements only displaying valid UTF-8 codepoints.

[...] the top-level subpart has a From header field, and its addr-spec matches the addr-spec in the message's From header field

All of the previous kind-of boils down to actually making it possible to compare if these addresses match. But there should be an extra section to explicitly mandate:

a) that A-labels and U-labels of the same domain must be considered equivalent

b) that QP-encoded, base64-encoded and plain UTF-8 of the same header must be considered equivalent

TL;DR: It should be a core design principle that the default encoding must be UTF-8 and default representation should be 8-bit and not QP/B64, if it is not pure ASCII. If compatibility with legacy systems is needed, any QP/B64 representation should still use UTF-8. Different representations of the same thing should be mandated to function as expected and equivalent.

Edited by Taavi Eomäe
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information