Subject prefix triggers sending of nonexistent character sets big5_tw / eucgb2312_cn
My site recently observed the symptoms described in https://mail.python.org/archives/list/mailman-users@python.org/message/JOPBF4LYTK6XHDWU776XAFHZSSM3AAJ6/ in which the message 550'd for "Decoding of header Subject failed."
Upon investigation I found an issue when subject prefixes are applied, and there is a combination of base64 AND quoted-printable encoded-words, and certain encodings are present. Namely:
from mailman.testing.helpers import (specialized_message_from_string as message_from_string)
from mailman.config import config
msg = message_from_string("""\
Subject: =?big5?B?aGVsbG8gxMfD/bXYCg==
=?big5?Q?_Hello_World?=
""")
m.subject_prefix = "[Oopsie] "
config.handlers['subject-prefix'].process(m, msg, {})
print("Subject:", msg['subject'].encode())
Result: Subject: [Oopsie] =?big5_tw?b?aGVsbG8gxMfD/bXY?=
You can also try with
Subject: =?gb2312?B?xOO6w8rAvec=
=?gb2312?Q?_Hello_World?=
resulting in Subject: [Oopsie] =?eucgb2312_cn?b?xOO6w8rAvec=?=
(the same ridiculous encoding noted by Steve on mailman-users).The catch, of course, is that neither big5_tw
nor eucgb2312_cn
are acceptable IANA registered character sets. Yet this is precisely what Mailman throws on the wire -- this shouldn't happen and I assume this is why these messages are 550'ing with CAT.InvalidContent.Exception
. I believe this may be related to the CODEC_MAP in stdlib email/charset.py.