Skip to content

Uncaught UnicodeDecodeError exception fix

The UnicodeDecodeErroroccurs in mailbox module used in hyperkitty_import.py when sender of given message cannot be decoded using ascii codec.

Problematic code in mailbox module for Python >=3.11:

def get_message(self, key):
        """Return a Message representation or raise a KeyError."""
        start, stop = self._lookup(key)
        self._file.seek(start)
        from_line = self._file.readline().replace(linesep, b'').decode('ascii') # This line throws UnicodeDecodeError if sender's address is possible to decode using ascii 
        string = self._file.read(stop - self._file.tell())
        msg = self._message_factory(string.replace(linesep, b'\n'))
        msg.set_unixfrom(from_line)
        msg.set_from(from_line[5:])
        return msg

And for Python <3.11:

def get_message(self, key):
        """Return a Message representation or raise a KeyError."""
        start, stop = self._lookup(key)
        self._file.seek(start)
        from_line = self._file.readline().replace(linesep, b'')
        string = self._file.read(stop - self._file.tell())
        msg = self._message_factory(string.replace(linesep, b'\n'))
        msg.set_from(from_line[5:].decode('ascii')) # This line throws UnicodeDecodeError
        return msg

As the result of exception, the whole import is being interrupted and all remaining emails are not being imported. What I would expect is that when such exception occurs, the specific email that caused it is being skipped, and the import continues. This is what I've propose in this PR. It is in a way a duplicate of !615 (merged) but it addresses the problem in a different way.

Merge request reports