Uncaught UnicodeDecodeError exception fix
The UnicodeDecodeError
occurs in mailbox
module used in hyperkitty_import.py
when sender of given message cannot be decoded using ascii codec.
Problematic code in mailbox
module for Python >=3.11:
def get_message(self, key):
"""Return a Message representation or raise a KeyError."""
start, stop = self._lookup(key)
self._file.seek(start)
from_line = self._file.readline().replace(linesep, b'').decode('ascii') # This line throws UnicodeDecodeError if sender's address is possible to decode using ascii
string = self._file.read(stop - self._file.tell())
msg = self._message_factory(string.replace(linesep, b'\n'))
msg.set_unixfrom(from_line)
msg.set_from(from_line[5:])
return msg
And for Python <3.11:
def get_message(self, key):
"""Return a Message representation or raise a KeyError."""
start, stop = self._lookup(key)
self._file.seek(start)
from_line = self._file.readline().replace(linesep, b'')
string = self._file.read(stop - self._file.tell())
msg = self._message_factory(string.replace(linesep, b'\n'))
msg.set_from(from_line[5:].decode('ascii')) # This line throws UnicodeDecodeError
return msg
As the result of exception, the whole import is being interrupted and all remaining emails are not being imported. What I would expect is that when such exception occurs, the specific email that caused it is being skipped, and the import continues. This is what I've propose in this PR. It is in a way a duplicate of !615 (merged) but it addresses the problem in a different way.