hyperkitty_import command leaves archives public until import finishes
At Wikimedia we are working on migrating from Mailman2 to Mailman3. We discovered that during the archive import process (hyperkitty_import), the archives are entirely public until the import finishes (anywhere from minutes to an hour).
I reproduced this in our testing environment and believe I've identified the cause.
Steps to reproduce:
Create a list in Mailman2 and set the archives to be private.
Import the list in Mailman3:
mailman create test@example.org
mailman import21 test@example.org /var/...config.pck
Inspect the mailman3.mailinglist database table, archive_policy for that list should be 1.
Inspect the mailman3_web.hyperkitty_mailinglist database table, the list should not yet be present. At this point the hyperkitty web UI says the list does not exist.
Import the archive into hyperkitty (I created a dummy mbox with tens of thousands of emails so it wouldn't finish immediately):
mailman-web hyperkitty_import -l test@example.org test.mbox
Once the progress marker appears, the list archives should be public on the web, despite the archive policy being set to private.
If you inspect the mailman3_web.hyperkitty_mailinglist table, the list will be there with archive_policy set to 2 (public). Once the import finishes, it'll be synced with mailman and change to the correct setting of 1.
Likely cause:
hyperkitty_import does not explicitly create an entry in the hyperkitty_mailing list table, it only does so once the hyperkitty.lib.incoming.add_to_list() function is called on the first message:
mlist = MailingList.objects.get_or_create(name=list_name)[0]
if not getattr(settings, "HYPERKITTY_BATCH_MODE", False):
update_from_mailman(mlist.name)
mlist.save()
Because batch mode is set, the default settings for the MailingList object are used, which is a public archive.
Suggested remediation:
hyperkitty_import should explicitly create and insert the MailingList object and sync it with mailman before beginning the import so that the correct archive_policy is used.
MailingList.archive_policy should default to private. Given that in all cases update_from_mailman() should be immediately called, it shouldn't actually make a difference, except in cases like this where the list object isn't synced right away.
Credit: Amir Sarabadani and Kunal Mehta.