Mailman rewrite the Cc header incorrectly if there is a CRLF within the display name
Detail discussion in mailing list: https://lists.mailman3.org/archives/list/mailman-users@mailman3.org/thread/L3XFBLEITYXBLL233ULKCX2QAB6KDLVQ/
I notice that if I send an email to a list with a Cc header broken into multiple lines and the pair of double quotes are put in different lines, the email addresses afterwards will be wrongly rewritten.
For example,
CC: "ABC DEF
(XYZ)" <name@example.com>
It will become the following after going through the mailing list:
CC: XYZ <"ABC DEF"@mail.server.name>
I observe that Outlook fold the CC header in this way quite often when there are more than four email addresses since our organization's email addresses are normally in this format: "Firstname Lastname (Dept)" <user@example.com>
According to RFC5322, it should be invalid. As Mark Sapiro mentioned,
Given an address of the form
display-name (comment)
in a header, it can be folded either between the display-name and the (comment) or between the (comment) and the , but if display-name (comment) is quoted as in
"display-name (comment)"
"display-name (comment)" is now a phrase within which folding is not allowed and this can only be folded between the "display-name (comment)" and the .
Further study by Mark Sapiro:
The issue occurs in mailman/handlers/avoid_duplicates.py. That module will rewrite the Cc: header after possibly deleting some of the entries.
That module calls msg.get_all('cc') to get the original Ccs and calls email.utils.getaddresses() on that to get a list of all the name, address pairs.
In our case, msg.get_all('cc') returns one of two things. If the message came via LMTP with CRLF line endings, it returns
['"ABC DEF\r\n (XYZ)" mark@example.com']
If the message came from mailman inject with LF line endings, it returns
['"ABC DEF\n (XYZ)" mark@example.com'] then email.utils.getaddresses() calls internally email.utils.parseaddr on each item in the list to make a list (of one in our case) of name, address pairs. The result is different depending on the line endings which explains why it works with mailman inject but not with LMTP.
In the case where the line endings are LF, the return is
[('ABC DEF\n (XYZ)', 'mark@example.com')]
but if the line endings are CRLF, the return is
[('XYZ', 'ABC DEF')]
If the "folding" is not quoted as in
ABC DEF (XYZ) mark@example.comthen the return is
('ABC DEF (XYZ)', 'mark@example.com')
In order to resolve the issue, we have submit a ticket to Microsoft to request to fix it in Outlook. However, it probably take long time and Microsoft may not fix it. A workaround can be made on mailman/handlers/avoid_duplicates.py is to remove CRLF and parse the CC header instead of using email.utils.getaddresses().
Sample code provided by Mark Sapiro
avoid_duplicates_patch.txt This patch is no good. See this post and !652 (merged).