Unable to import mbox downloaded from another instance of Hyperkitty

Hi!

I assumed that if I download an email archive from Hyperkitty using the Download button it should be healthy right?

We've downloaded the list archive from remote Mailman 3 server with these versions:

  • HyperKitty version 1.3.3
  • Mailman Core Version - GNU Mailman 3.3.1 (Tom Sawyer)
  • Mailman Core API Version - 3.1
  • Mailman Core Python Version - 3.7.3 (default, Dec 20 2019, 18:57:59) [GCC 8.3.0]

and when we try to import it into our

  • HyperKitty version 1.3.5

we've got an exception:

Importing from mbox file /opt/mailman-web-data/lambeth_lmc@lists.lmc.org.uk-2022-01.mbox to lambeth_lmc@lists.lmc.org.uk
97%Traceback (most recent call last):
  File "manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "/usr/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
    utility.execute()
  File "/usr/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/lib/python3.8/site-packages/django/core/management/base.py", line 330, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute
    output = self.handle(*args, **options)
  File "/usr/lib/python3.8/site-packages/hyperkitty/management/commands/hyperkitty_import.py", line 381, in handle
    importer.from_mbox(mbfile, report_name)
  File "/usr/lib/python3.8/site-packages/hyperkitty/management/commands/hyperkitty_import.py", line 170, in from_mbox
    for msg in mbox:
  File "/usr/lib/python3.8/mailbox.py", line 109, in itervalues
    value = self[key]
  File "/usr/lib/python3.8/mailbox.py", line 73, in __getitem__
    return self.get_message(key)
  File "/usr/lib/python3.8/mailbox.py", line 781, in get_message
    msg.set_from(from_line[5:].decode('ascii'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)

If using script check_hk_import.py we are getting this:

Failed to retrieve message number 7320
    'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)
Processed 7487 messages.

I'm not sure how to find that offending message number 7320 in 1.7G mbox file...

Another upsetting issue is that ~67 messages have been imported with None subject and None sender and dated by the time of import:

Screenshot_2022-01-07_at_10.54.49

This is also quite weird because we are importing from another HK instance and would expect everything already sorted out in the archive...

Asking for help!

Edited by Danil Smirnov