Skip to content

mailman2_download: partial downloads

Some archive files are bigger and download fails with:

Traceback (most recent call last):
  File "/usr/bin/django-admin", line 5, in <module>
    management.execute_from_command_line()
  File "/usr/lib/python2.7/site-packages/django/core/management/__init__.py", line 354, in execute_from_command_line
    utility.execute()
  File "/usr/lib/python2.7/site-packages/django/core/management/__init__.py", line 346, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/usr/lib/python2.7/site-packages/django/core/management/base.py", line 394, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/usr/lib/python2.7/site-packages/django/core/management/base.py", line 445, in execute
    output = self.handle(*args, **options)
  File "/usr/lib/python2.7/site-packages/hyperkitty/management/commands/mailman2_download_fixed.py", line 145, in handle
    [options], options["start"], MONTHS))
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 250, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/usr/lib64/python2.7/multiprocessing/pool.py", line 554, in get
    raise self._value
httplib.IncompleteRead: IncompleteRead(8105 bytes read)

wget confirms the server behavior:

$ wget https://www.redhat.com/archives/rdo-list/2016-October.txt.gz
--2017-08-01 14:47:41--  https://www.redhat.com/archives/rdo-list/2016-October.txt.gz
Resolving www.redhat.com (www.redhat.com)... 2600:1415:11:4ad::d44, 2600:1415:11:4a7::d44, 104.98.16.99
Connecting to www.redhat.com (www.redhat.com)|2600:1415:11:4ad::d44|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 3573 (3.5K) [application/x-gzip]
Saving to: ‘2016-October.txt.gz’

2016-October.txt.g  99%[============> ]   3.49K  --.-KB/s    in 0s      

2017-08-01 14:47:44 (53.1 MB/s) - Connection closed at byte 3572. Retrying.

--2017-08-01 14:47:45--  (try: 2)  https://www.redhat.com/archives/rdo-list/2016-October.txt.gz
Connecting to www.redhat.com (www.redhat.com)|2600:1415:11:4ad::d44|:443... connected.
HTTP request sent, awaiting response... 206 Partial Content
Length: 885509 (865K), 881937 (861K) remaining [application/x-gzip]
Saving to: ‘2016-October.txt.gz’

2016-October.txt.g 100%[=============>] 864.75K   145KB/s    in 5.9s    

2017-08-01 14:47:53 (145 KB/s) - ‘2016-October.txt.gz’ saved [885509/885509]

I found other person having this problem here: https://stackoverflow.com/questions/14149100/incompleteread-using-httplib

The httplib monkey patch does not work (anymore?). Downgrading to HTTP 1.0 with the following (borrowed form the same page) works:

import httplib
httplib.HTTPConnection._http_vsn = 10
httplib.HTTPConnection._http_vsn_str = 'HTTP/1.0'

I have no idea how to fix this properly, so I cannot suggest a PR.