Gitlab-backup not failing when there are problems with the backup
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Summary
Currently I use gitlab-backup with a wrapper script to backup my gitlab instance. The script calls gitlab-backup and does a few other things like sync the backup to various places and clean up the directory. The script reads the return code from calling gitlab-backup and pipes the output to a log file.
Recently I discovered a problem, where the version of pg_dump used by the gitlab-backup(13) was not the same as the version of our postgres cluster (15) so the db backup was failing. That in itself was not the issue. The problem was that while the backup was running it would fail the database portion and happily carry on the rest of the backup. There was a log of the problem but it was lost in the 1000s of lines of other output produced and wasn't noticed.
As far as I can see there are 2 issues
-
gitlab-backupcan fail at backing up a portion of the backup (like the database) and still return 0 showing the backup ran successfully. This behavior obscures the problem from the administrator as it seems that the entire process ran correctly. At the very least the return code should be 1, if not the entire backup process immediately failing and having to be explicitly restarted by the admin to ignore the database. - The output of the backup is too noisy. I am aware that
CRON=1can be appended to the command to make it less noisy, but this should be the default as there are likely very few instances where an administrator would be interested in the entire output, and it can easily obscure an actual issue from being noticed.
Steps to reproduce
- Create a mismatch between pg_dump used by gitlab and the postgres cluster used by the database
- Run
gitlab-backup - Observe the fact that it looks like the backup completed correctly despite the fact that the database was not backed up
Example Project
N/A Self-hosted
What is the current bug behavior?
gitlab-backup can miss an important portion of the backup and stil complete 'successfully'
What is the expected correct behavior?
Gitlab backup should be less noisy by default with its logging and return an exit code of 1 if there is a failure with any of the components