volume missing if --asynchronous-upload and put fails
Summary
Follow up of [Duplicity-talk] Strange exception during a restore with 1.2.1 from AWS
Volumes are missing if put
gets interrupted and --asynchronous-upload
used without a proper error message.
Environment
Linux
2.0.2
PYTHONPATH=. bin/duplicity full --encrypt-key=xxxxxxxxxxx --volsize 1 --backend-retry-delay 5 --num-retries 2 --asynchronous-upload --verbosity n /bin file:///tmp/back2
Steps to reproduce
Change duplicity/backends/localbackend.py
to simulate a issue with one volume:
def _put(self, source_path, remote_filename):
target_path = self.remote_pathdir.append(remote_filename)
source_path.setdata()
source_size = source_path.getsize()
progress.report_transfer(0, source_size)
if b"vol2" in remote_filename:
time.sleep(10)
raise FileNotFoundError("Forcing Issue - THL")
target_path.writefileobj(source_path.open("rb"))
progress.report_transfer(source_size, source_size)
This is failing if the 2nd volume get uploaded (this is just a simulation I see the same behavior if I block HTTPS request with a proxy for the boto3 backend, but local files makes it easier to reproduce)
run
PYTHONPATH=. bin/duplicity full --encrypt-key=xxxxxxxxxxx --volsize 1 --backend-retry-delay 5 --num-retries 2 --asynchronous-upload --verbosity n /bin
file:///tmp/back2
--volsize 1 --backend-retry-delay 5 --num-retries 2
are only set for faster failure, but can be left out. Simply make sure more than 3 volumes get backuped.
Command output will be:
Local and Remote metadata are synchronized, no sync needed.
Last full backup date: none
Attempt of put Nr. 1 failed. FileNotFoundError: Forcing Issue - THL
Giving up after 2 attempts. FileNotFoundError: Forcing Issue - THL
and the target path looks:
drwxr-xr-x 1 vscode vscode 288 Aug 17 18:22 .
drwxrwxrwt 1 root root 20886 Aug 17 18:22 ..
-rw-r--r-- 1 vscode vscode 1043038 Aug 17 18:22 duplicity-full.20230817T182222Z.vol1.difftar.gpg
-rw-r--r-- 1 vscode vscode 1026317 Aug 17 18:22 duplicity-full.20230817T182222Z.vol3.difftar.gpg
-rw-r--r-- 1 vscode vscode 601246 Aug 17 18:22 duplicity-full.20230817T182222Z.vol4.difftar.gpg
The Programm does not return. It runs forever. And need to be killed.
Remove the error simulation code from put
def _put(self, source_path, remote_filename):
target_path = self.remote_pathdir.append(remote_filename)
source_path.setdata()
source_size = source_path.getsize()
progress.report_transfer(0, source_size)
target_path.writefileobj(source_path.open("rb"))
progress.report_transfer(source_size, source_size)
Then run the command again.
$ PYTHONPATH=. bin/duplicity full --encrypt-key=2BB5729A5CFB3C45 --volsize 1 --backend-retry-delay 5 --num-retries 2 --asynchronous-upload --verbosity n /bin file:///tmp/back2
Local and Remote metadata are synchronized, no sync needed.
Last full backup left a partial set, restarting.
Last full backup date: Thu Aug 17 18:43:15 2023
RESTART: Volumes 3 to 4 failed to upload before termination.
Restarting backup at volume 3.
Restarting after volume 2, file mount, block 0
--------------[ Backup Statistics ]--------------
StartTime 1692297923.02 (Thu Aug 17 18:45:23 2023)
EndTime 1692297923.28 (Thu Aug 17 18:45:23 2023)
ElapsedTime 0.26 (0.26 seconds)
SourceFiles 96
SourceFileSize 7899934 (7.53 MB)
NewFiles 96
NewFileSize 7899934 (7.53 MB)
DeletedFiles 0
ChangedFiles 0
ChangedFileSize 0 (0 bytes)
ChangedDeltaSize 0 (0 bytes)
DeltaEntries 96
RawDeltaSize 7899769 (7.53 MB)
TotalDestinationSizeChange 1627592 (1.55 MB)
Errors 0
-------------------------------------------------
It will finalize the backup, but assuming Volume 2 was backuped successfully.
$ ls -la /tmp/back2/
total 2796
drwxr-xr-x 1 vscode vscode 482 Aug 17 18:45 .
drwxrwxrwt 1 root root 20886 Aug 17 18:45 ..
-rw-r--r-- 1 vscode vscode 180501 Aug 17 18:45 duplicity-full-signatures.20230817T184315Z.sigtar.gpg
-rw-r--r-- 1 vscode vscode 1109 Aug 17 18:45 duplicity-full.20230817T184315Z.manifest.gpg
-rw-r--r-- 1 vscode vscode 1043039 Aug 17 18:43 duplicity-full.20230817T184315Z.vol1.difftar.gpg
-rw-r--r-- 1 vscode vscode 1026346 Aug 17 18:45 duplicity-full.20230817T184315Z.vol3.difftar.gpg
-rw-r--r-- 1 vscode vscode 601246 Aug 17 18:45 duplicity-full.20230817T184315Z.vol4.difftar.gpg
Everything is there, but volume 2 is missing.
The only indicator that something went wrong is the not returning first duplicity run. But think of an backup job in a crontab and the server gets rebooted, between both jobs, this issue doesn't get recognized.
What is the current bug behaviour?
Backup is missing one volume
What is the expected correct behaviour?
Backup should fail as soon as the failed put of volume 2 is recognized.
(If --asynchronous-upload
is not used this is the correct behavior)
Possible fixes
The common exception handling in the FatalError method (https://gitlab.com/poggenpower/duplicity/-/blob/7b62e3b9c47d8a8db1fbea913226b66d357beeb4/duplicity/log.py#L344), that stops duplicity typically, is catched in a side thread, but not in the main threat therefore the whole program doesn't stop.
Exception catched:
Exception has occurred: SystemExit
50
File "/workspaces/duplicity/duplicity/backend.py", line 368, in inner_retry
return fn(self, *args)
^^^^^^^^^^^^^^^
File "/workspaces/duplicity/duplicity/backend.py", line 544, in put
self.__do_put(source_path, remote_filename)
File "/workspaces/duplicity/duplicity/backend.py", line 530, in __do_put
self.backend._put(source_path, remote_filename)
File "/workspaces/duplicity/duplicity/backends/localbackend.py", line 67, in _put
raise FileNotFoundError("Forcing Issue - THL")
FileNotFoundError: Forcing Issue - THL
During handling of the above exception, another exception occurred:
File "/workspaces/duplicity/duplicity/log.py", line 348, in FatalError
sys.exit(code)
File "/workspaces/duplicity/duplicity/backend.py", line 401, in inner_retry
log.FatalError(_("Giving up after %s attempts. %s: %s")
File "/workspaces/duplicity/duplicity/dup_main.py", line 331, in put
backend.put(tdp, dest_filename)
File "/workspaces/duplicity/duplicity/dup_main.py", line 452, in <lambda>
vol_num: put(tdp, dest_filename, vol_num),
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/workspaces/duplicity/duplicity/asyncscheduler.py", line 176, in <lambda>
(waiter, caller) = async_split(lambda: fn(*params))
^^^^^^^^^^^
File "/workspaces/duplicity/duplicity/dup_threading.py", line 152, in caller
value = fn()
^^^^
File "/workspaces/duplicity/duplicity/asyncscheduler.py", line 234, in __execute_caller
succeeded, waiter = caller()
^^^^^^^^
File "/workspaces/duplicity/duplicity/asyncscheduler.py", line 219, in trampoline
self.__execute_caller(caller)
SystemExit: 50
The main thread is spinning endless cycels in:
it seems that the failed execution is not recognized.
I tried to get my head into the threading code, but got lost. I did some threaded python programs is the past, but was more relying on the higher level APIs like Queues etc. and even then keeping track of all thread states was a mess. Therefore I feel not comfortable to work on a fix.
As the asyncronos-upload feature is flagged as experimental, the issue is not totally undetectable I think it has not a high priority.