Mbox export produces incorrect MIME messages
As I maintain a tool that programmatically fetches and parses messages from mailman archives, I noticed a few issues with mbox exports from HyperKitty:
-
HTML texts are provided as attachments rather than as an alternative to the plaintext: Currently, HyperKitty produces a
multipart/mixed
document for each message that looks like this:multipart/mixed - text/plain - [text/html, Content-Disposition: attachment; filename="attachment.html"] - [attachments...]
This structure causes different problems in multiple clients I tried, including Thunderbird, Windows Mail, and Squeak, because the formatted text is not displayed inside the reader but just provided as a separate attachment. Instead, HyperKitty should produce a
multipart/alternative
document for the text according to RFC1341:The
multipart/alternative
type is syntactically identical to multipart/mixed, but the semantics are different. In particular, each of the parts is an "alternative" version of the same information. User agents should recognize that the content of the various parts are interchangeable. The user agent should either choose the "best" type based on the user's environment and preferences, or offer the user the available alternatives. 1So, the entire message could look like this instead:
multipart/mixed - multipart/alternative - text/plain - [text/html, Content-Disposition: attachment; filename="attachment.html"] - [attachments...]
-
Inline documents are not linked correctly: Both
text/html
and sometimestext/plain
messages may contain inline documents (mostly, inline images). To correctly describe these documents, the relevant document inside the message should have a nestedmultipart/related
structure as follows according to RFC2387/RFC2392:multipart/related - text/plain | text/html - [image/png, Content-Disposition: inline; filename="image.png", Content-ID: <foo>] - [image/jpeg, Content-Disposition: inline; filename="image.jpeg", Content-ID: <bar>] - ...
Inside the text message, the documents may then be referenced using
<img src="cid:foo">
in HTML or[cid:foo]
in plaintext.Currently, HyperKitty does not even provide these CIDs, making it impossible to reconstruct inline images correctly even when I post-process the downloaded files.
-
HTML documents do not specify an explicit
charset
: It seems that all messages are encoded as UTF-8, but not all clients assume that as a default. It would be helpful to always providecharset=utf-8
for these documents. -
Email obfuscation (
@
->(at)
) character breaks contents: While messages that contain@
such as this one look fine in the web interface, they are broken in the mbox export. Next to worsening readability, this breaks mail addresses that people might want to reply to and URLs to other conversations on HyperKitty that people might want to open in the browser (see the example message). In some but not all cases, only the HTML version is affected. Is this obfuscation really necessary, in particular in this simple form and when it is not applied on the website itself?
Thank you for delivering this great service! It would be awesome if these issues could be solved. I'm there for any questions!