BUG: `--s3-use-multiprocessing` results in `TypeError: cannot use a string pattern on a bytes-like object`

I have:

Summary

When using boto backend and --s3-use-multiprocessing the upload crashes with:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/duplicity/backends/_boto_multi.py", line 223, in _upload
    mp.upload_part_from_file(fd, offset + 1, cb=_upload_callback,
  File "/usr/lib/python3/dist-packages/boto/s3/multipart.py", line 257, in upload_part_from_file
    key.set_contents_from_file(fp, headers=headers, replace=replace,
  File "/usr/lib/python3/dist-packages/boto/s3/key.py", line 1307, in set_contents_from_file
    self.send_file(fp, headers=headers, cb=cb, num_cb=num_cb,
  File "/usr/lib/python3/dist-packages/boto/s3/key.py", line 760, in send_file
    self._send_file_internal(fp, headers=headers, cb=cb, num_cb=num_cb,
  File "/usr/lib/python3/dist-packages/boto/s3/key.py", line 932, in _send_file_internal
    self.content_type = mimetypes.guess_type(self.path)[0]
  File "/usr/lib/python3.8/mimetypes.py", line 292, in guess_type
    return _db.guess_type(url, strict)
  File "/usr/lib/python3.8/mimetypes.py", line 117, in guess_type
    scheme, url = urllib.parse._splittype(url)
  File "/usr/lib/python3.8/urllib/parse.py", line 1008, in _splittype
    match = _typeprog.match(url)
TypeError: cannot use a string pattern on a bytes-like object

This is because mimetypes.guess_type is expecting a string, but we're giving it a byte object. Perhaps this changed from Python 2 to 3 but as far back as I can trace it seem the temp file name that we're using here is in byte form (I'd guess the tempfile generator is spitting it out that way).

Environment

  • Arch Linux, rolling
  • Python 3.10.4
  • duplicity 0.8.22, via community/duplicity 0.8.22-1

Command line:

BUCKET="s3://us-southeast-1.linodeobjects.com/[name redacted]"

  duplicity --verbosity notice \
         --encrypt-key "$enc_key" --sign-key "$sign_key" \
         --full-if-older-than 30D --num-retries 5 \
         --allow-source-mismatch --volsize 100 \
         --s3-use-new-style --s3-use-multiprocessing \
         --asynchronous-upload \
         --s3-multipart-max-procs 8 --s3-multipart-chunk-size 15 \
         --s3-endpoint-url "https://us-southeast-1.linodeobjects.com" \
         --progress --allow-source-mismatch \
         --exclude-if-present .nobackup \
         --archive-dir /root/.cache/duplicity \
         --log-file /var/log/duplicity.log \
         --exclude '**rdiff-backup-data' \
         --name home --exclude-filelist exclude-home.list /home $BUCKET/home

Steps to reproduce

It's 100% re-producable for me with the boto backend since --s3-use-multiprocessing, triggers the use of _boto_multi...

What is the current bug behaviour?

Dumps a stack trace over and over as shown above (and falls back to non-multi-upload).

What is the expected correct behaviour?

Should do multi-upload as expected.

Relevant logs and/or screenshots

None (see backtrace above).

Possible fixes

Since FileChunkIO only seems to exist for s3-use-multiprocessing we can just patch it so that during it's __init__ the name is converted back into a string using os.fsdecode... PR to follow once I can figure out how to do that on Gitlab.

https://gitlab.com/duplicity/duplicity/-/blob/master/duplicity/backends/_boto_multi.py#L220

This would also be easy to fix in boto, but it seems to be no longer maintained.

Edited by Josh Goebel