BUG: `--s3-use-multiprocessing` results in `TypeError: cannot use a string pattern on a bytes-like object`
I have:
- ([x] when completed)
-
searched https://gitlab.com/duplicity/duplicity/-/issues for similar issues. Didn't see anything. -
searched https://bugs.launchpad.net/duplicity for similar issues. YES, it was reported before: https://bugs.launchpad.net/ubuntu/+source/duplicity/+bug/1930640 -
tested that this issue still occurs on the latest stable snap... I can't see why it would not, I found and fixed the [still present] bug. -
ideally, tested that this issue still occurs on the latest edge snap, if you can test without risking your data. Please include the snap version output: installed: x.xx.xx (xx)
Summary
When using boto backend and --s3-use-multiprocessing the upload crashes with:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/duplicity/backends/_boto_multi.py", line 223, in _upload
mp.upload_part_from_file(fd, offset + 1, cb=_upload_callback,
File "/usr/lib/python3/dist-packages/boto/s3/multipart.py", line 257, in upload_part_from_file
key.set_contents_from_file(fp, headers=headers, replace=replace,
File "/usr/lib/python3/dist-packages/boto/s3/key.py", line 1307, in set_contents_from_file
self.send_file(fp, headers=headers, cb=cb, num_cb=num_cb,
File "/usr/lib/python3/dist-packages/boto/s3/key.py", line 760, in send_file
self._send_file_internal(fp, headers=headers, cb=cb, num_cb=num_cb,
File "/usr/lib/python3/dist-packages/boto/s3/key.py", line 932, in _send_file_internal
self.content_type = mimetypes.guess_type(self.path)[0]
File "/usr/lib/python3.8/mimetypes.py", line 292, in guess_type
return _db.guess_type(url, strict)
File "/usr/lib/python3.8/mimetypes.py", line 117, in guess_type
scheme, url = urllib.parse._splittype(url)
File "/usr/lib/python3.8/urllib/parse.py", line 1008, in _splittype
match = _typeprog.match(url)
TypeError: cannot use a string pattern on a bytes-like object
This is because mimetypes.guess_type is expecting a string, but we're giving it a byte object. Perhaps this changed from Python 2 to 3 but as far back as I can trace it seem the temp file name that we're using here is in byte form (I'd guess the tempfile generator is spitting it out that way).
Environment
- Arch Linux, rolling
- Python 3.10.4
-
duplicity 0.8.22, viacommunity/duplicity 0.8.22-1
Command line:
BUCKET="s3://us-southeast-1.linodeobjects.com/[name redacted]"
duplicity --verbosity notice \
--encrypt-key "$enc_key" --sign-key "$sign_key" \
--full-if-older-than 30D --num-retries 5 \
--allow-source-mismatch --volsize 100 \
--s3-use-new-style --s3-use-multiprocessing \
--asynchronous-upload \
--s3-multipart-max-procs 8 --s3-multipart-chunk-size 15 \
--s3-endpoint-url "https://us-southeast-1.linodeobjects.com" \
--progress --allow-source-mismatch \
--exclude-if-present .nobackup \
--archive-dir /root/.cache/duplicity \
--log-file /var/log/duplicity.log \
--exclude '**rdiff-backup-data' \
--name home --exclude-filelist exclude-home.list /home $BUCKET/home
Steps to reproduce
It's 100% re-producable for me with the boto backend since
--s3-use-multiprocessing, triggers the use of _boto_multi...
What is the current bug behaviour?
Dumps a stack trace over and over as shown above (and falls back to non-multi-upload).
What is the expected correct behaviour?
Should do multi-upload as expected.
Relevant logs and/or screenshots
None (see backtrace above).
Possible fixes
Since FileChunkIO only seems to exist for s3-use-multiprocessing we can just patch it so that during it's __init__ the name is converted back into a string using os.fsdecode... PR to follow once I can figure out how to do that on Gitlab.
https://gitlab.com/duplicity/duplicity/-/blob/master/duplicity/backends/_boto_multi.py#L220
This would also be easy to fix in boto, but it seems to be no longer maintained.