Failed `gitlab-backup create` to remote storage silently fails with no error
Summary
When gitlab-backup create
is run and fails, the backup completes and creates a backup tar as though all went well.
This leads to an incomplete backup in a bad state. Restoring this will likely cause issues.
In this issue, we focus on an upload to remote storage that fails.
Manually running these backup rake tasks sometimes gives an error but not in the backup_json.log
for omnibus instances.
Steps to reproduce
-
Setup a remote storage bucket - AWS S3 Bucket or GCS bucket.
-
On
gitlab.rb
file, enable backup to remote storage using either AWS or GCS:-
Or GCS
Example:
gitlab_rails['backup_upload_connection'] = { 'provider' => 'AWS', 'region' => 'eu-west-1', #'aws_access_key_id' => 'AKIAKIAKI', # 'aws_secret_access_key' => 'secret123' # If using an IAM Profile, don't configure aws_access_key_id & aws_secret_access_key 'use_iam_profile' => true } gitlab_rails['backup_upload_remote_directory'] = 'my.s3.bucket'
-
Run
gitlab-ctl reconfigure
-
Backup GitLab :
sudo gitlab-backup create
-
As the backup is ongoing, collect SOS logs to analyze
backup_json.log
:curl https://gitlab.com/gitlab-com/support/toolbox/gitlabsos/raw/master/gitlabsos.rb | sudo /opt/gitlab/embedded/bin/ruby
Example Project
What is the current bug behavior?
- When a backup to remote storage fails, the logs in
backup_json.log
does not indicate any error and silently fails. This makes it hard to detect if the uploaded file/backup was unsuccessful and why.
{"severity":"INFO","time":"2022-09-14T14:02:33.399Z","correlation_id":null,"message":"Creating backup archive: 1663164000_2022_09_14_15.2.2-ee_gitlab_backup.tar ... "}
{"severity":"INFO","time":"2022-09-14T14:02:55.694Z","correlation_id":null,"message":"Creating backup archive: 1663164000_2022_09_14_15.2.2-ee_gitlab_backup.tar ... done"}
{"severity":"INFO","time":"2022-09-14T14:02:55.697Z","correlation_id":null,"message":"Uploading backup archive to remote storage REDACTED ... "}
{"severity":"INFO","time":"2022-09-14T14:02:55.713Z","correlation_id":null,"message":"Deleting tar staging files ... "}
{"severity":"INFO","time":"2022-09-14T14:02:55.713Z","correlation_id":null,"message":"Cleaning up /var/opt/gitlab/backups/backup_information.yml"}
{"severity":"INFO","time":"2022-09-14T14:02:55.714Z","correlation_id":null,"message":"Cleaning up /var/opt/gitlab/backups/db"}
{"severity":"INFO","time":"2022-09-14T14:02:55.714Z","correlation_id":null,"message":"Cleaning up /var/opt/gitlab/backups/repositories"}
{"severity":"INFO","time":"2022-09-14T14:02:55.720Z","correlation_id":null,"message":"Cleaning up /var/opt/gitlab/backups/uploads.tar.gz"}
{"severity":"INFO","time":"2022-09-14T14:02:55.720Z","correlation_id":null,"message":"Cleaning up /var/opt/gitlab/backups/builds.tar.gz"}
{"severity":"INFO","time":"2022-09-14T14:02:55.721Z","correlation_id":null,"message":"Cleaning up /var/opt/gitlab/backups/artifacts.tar.gz"}
{"severity":"INFO","time":"2022-09-14T14:02:55.721Z","correlation_id":null,"message":"Cleaning up /var/opt/gitlab/backups/pages.tar.gz"}
{"severity":"INFO","time":"2022-09-14T14:02:55.721Z","correlation_id":null,"message":"Cleaning up /var/opt/gitlab/backups/lfs.tar.gz"}
{"severity":"INFO","time":"2022-09-14T14:02:55.722Z","correlation_id":null,"message":"Cleaning up /var/opt/gitlab/backups/terraform_state.tar.gz"}
{"severity":"INFO","time":"2022-09-14T14:02:55.722Z","correlation_id":null,"message":"Cleaning up /var/opt/gitlab/backups/registry.tar.gz"}
{"severity":"INFO","time":"2022-09-14T14:02:55.890Z","correlation_id":null,"message":"Cleaning up /var/opt/gitlab/backups/packages.tar.gz"}
{"severity":"INFO","time":"2022-09-14T14:02:55.891Z","correlation_id":null,"message":"Deleting tar staging files ... done"}
{"severity":"INFO","time":"2022-09-14T14:02:55.891Z","correlation_id":null,"message":"Deleting backups/tmp ... "}
{"severity":"INFO","time":"2022-09-14T14:02:55.891Z","correlation_id":null,"message":"Deleting backups/tmp ... done"}
- The rake task log also does not indicate
...failed
or...done
but shows the error at the end.
2022-09-14 14:02:33 +0000 -- Creating backup archive: 1663164000_2022_09_14_15.2.2-ee_gitlab_backup.tar ...
2022-09-14 14:02:55 +0000 -- Creating backup archive: 1663164000_2022_09_14_15.2.2-ee_gitlab_backup.tar ... done
2022-09-14 14:02:55 +0000 -- Uploading backup archive to remote storage REDACTED ...
2022-09-14 14:02:55 +0000 -- Deleting tar staging files ...
2022-09-14 14:02:55 +0000 -- Cleaning up /var/opt/gitlab/backups/backup_information.yml
2022-09-14 14:02:55 +0000 -- Cleaning up /var/opt/gitlab/backups/db
2022-09-14 14:02:55 +0000 -- Cleaning up /var/opt/gitlab/backups/repositories
2022-09-14 14:02:55 +0000 -- Cleaning up /var/opt/gitlab/backups/uploads.tar.gz
2022-09-14 14:02:55 +0000 -- Cleaning up /var/opt/gitlab/backups/builds.tar.gz
2022-09-14 14:02:55 +0000 -- Cleaning up /var/opt/gitlab/backups/artifacts.tar.gz
2022-09-14 14:02:55 +0000 -- Cleaning up /var/opt/gitlab/backups/pages.tar.gz
2022-09-14 14:02:55 +0000 -- Cleaning up /var/opt/gitlab/backups/lfs.tar.gz
2022-09-14 14:02:55 +0000 -- Cleaning up /var/opt/gitlab/backups/terraform_state.tar.gz
2022-09-14 14:02:55 +0000 -- Cleaning up /var/opt/gitlab/backups/registry.tar.gz
2022-09-14 14:02:55 +0000 -- Cleaning up /var/opt/gitlab/backups/packages.tar.gz
2022-09-14 14:02:55 +0000 -- Deleting tar staging files ... done
2022-09-14 14:02:55 +0000 -- Deleting backups/tmp ...
2022-09-14 14:02:55 +0000 -- Deleting backups/tmp ... done
rake aborted!
ArgumentError: Missing required arguments: aws_access_key_id, aws_secret_access_key
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:520:in `connect_to_remote_directory'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:323:in `upload'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:222:in `run_all_create_tasks'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:46:in `create'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:12:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `<main>'
Tasks: TOP => gitlab:backup:create
(See full trace by running task with --trace)
What is the expected correct behavior?
- Have the log indicate
...failed
Or...SKIPPED
when the rake task throws an error for an unsuccessful upload to remote storage.
2022-09-14 14:02:55 +0000 -- Uploading backup archive to remote storage REDACTED ... failed
Relevant logs and/or screenshots
Output of checks
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of: `sudo gitlab-rake gitlab:env:info`) (For installations from source run and paste the output of: `sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production`)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true
)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true
)(we will only investigate if the tests are passing)
Possible fixes
The backup gets stuck at https://gitlab.com/gitlab-org/gitlab/-/blob/v14.10.5-ee/lib/backup/manager.rb#L279
puts_time "Uploading backup archive to remote storage #{remote_directory} ... ".color(:blue)
...
The check ought to get to #L291
if upload
if upload.respond_to?(:encryption) && upload.encryption
puts_time "Uploading backup archive to remote storage #{remote_directory} ... ".color(:blue) + "done (encrypted with #{upload.encryption})".color(:green)
else
puts_time "Uploading backup archive to remote storage #{remote_directory} ... ".color(:blue) + "done".color(:green)
end
else
puts_time "Uploading backup to #{remote_directory} failed".color(:red)
raise Backup::Error, 'Backup failed'
end
end
The most likely fix lies in ensuring the log for a failed remote storage gets to #L291 for correct logging