Unable to import mbox downloaded from another instance of Hyperkitty
Hi!
I assumed that if I download an email archive from Hyperkitty using the Download button it should be healthy right?
We've downloaded the list archive from remote Mailman 3 server with these versions:
- HyperKitty version 1.3.3
- Mailman Core Version - GNU Mailman 3.3.1 (Tom Sawyer)
- Mailman Core API Version - 3.1
- Mailman Core Python Version - 3.7.3 (default, Dec 20 2019, 18:57:59) [GCC 8.3.0]
and when we try to import it into our
- HyperKitty version 1.3.5
we've got an exception:
Importing from mbox file /opt/mailman-web-data/lambeth_lmc@lists.lmc.org.uk-2022-01.mbox to lambeth_lmc@lists.lmc.org.uk
97%Traceback (most recent call last):
File "manage.py", line 10, in <module>
execute_from_command_line(sys.argv)
File "/usr/lib/python3.8/site-packages/django/core/management/__init__.py", line 401, in execute_from_command_line
utility.execute()
File "/usr/lib/python3.8/site-packages/django/core/management/__init__.py", line 395, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/usr/lib/python3.8/site-packages/django/core/management/base.py", line 330, in run_from_argv
self.execute(*args, **cmd_options)
File "/usr/lib/python3.8/site-packages/django/core/management/base.py", line 371, in execute
output = self.handle(*args, **options)
File "/usr/lib/python3.8/site-packages/hyperkitty/management/commands/hyperkitty_import.py", line 381, in handle
importer.from_mbox(mbfile, report_name)
File "/usr/lib/python3.8/site-packages/hyperkitty/management/commands/hyperkitty_import.py", line 170, in from_mbox
for msg in mbox:
File "/usr/lib/python3.8/mailbox.py", line 109, in itervalues
value = self[key]
File "/usr/lib/python3.8/mailbox.py", line 73, in __getitem__
return self.get_message(key)
File "/usr/lib/python3.8/mailbox.py", line 781, in get_message
msg.set_from(from_line[5:].decode('ascii'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)
If using script check_hk_import.py we are getting this:
Failed to retrieve message number 7320
'ascii' codec can't decode byte 0xe2 in position 9: ordinal not in range(128)
Processed 7487 messages.
I'm not sure how to find that offending message number 7320 in 1.7G mbox file...
Another upsetting issue is that ~67 messages have been imported with None subject and None sender and dated by the time of import:
This is also quite weird because we are importing from another HK instance and would expect everything already sorted out in the archive...
Asking for help!
Edited by Danil Smirnov
