Backup upload timeouts prevent cleanup and fill disc space
Summary
If gitlab is configured to upload backups to an S3 (DigitalOcean Space in our case), from time to time the backup process is failing with a (not configurable?) timeout (in a cron job during night). The instance is configured to keep only 2 backups locally. If that upload then fails, the cleanup of the local stuff is not executed afterwards. If then the timeouts occurr multiple times one day after another, say during a weekend, the backup job is creating more and more huge backup files in /var/opt/gitlab/backups, but never cleaning them up. This then may sooner or later fill up the disc space there, resulting in a broken gitlab instance.
Steps to reproduce
Not sure how to reliably reproduce, but use a S3 storage which responds so slow that the timeouts are triggered.
What is the current bug behavior?
Failing upload tasks prevent from cleanup local backups which can result in no space left on device errors.
What is the expected correct behavior?
The upload and cleanup should be separated, so that the failed upload is not totally breaking the cleanup. Also the backup task (creating the local backup) might not create a backup if thats expected to fill up all the disc space (roughly estimated by last backup size?)
The uploader should also be aware of "not yet fully uploaded" backups and try to upload/resume them again and the cleanup should olny delete those backups which are fully uploaded.
Relevant logs and/or screenshots
Output of the cron job creating the backup with `/opt/gitlab/bin/gitlab-rake gitlab:backup:create CRON=1 SKIP=registry,artifacts,builds,uploads`:
rake aborted! Excon::Error::Timeout: write timeout reached /opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:50:in `upload' /opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:35:in `block in pack' /opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:19:in `chdir' /opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:19:in `pack' /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:20:in `block (3 levels) in ' /opt/gitlab/embedded/bin/bundle:23:in `load' /opt/gitlab/embedded/bin/bundle:23:in `'Caused by: OpenSSL::SSL::SSLErrorWaitWritable: write would block /opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:50:in
upload' /opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:35:in
block in pack' /opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:19:inchdir' /opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:19:in
pack' /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:20:inblock (3 levels) in <top (required)>' /opt/gitlab/embedded/bin/bundle:23:in
load' /opt/gitlab/embedded/bin/bundle:23:in `'...
Caused by: OpenSSL::SSL::SSLErrorWaitWritable: write would block /opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:50:in
upload' /opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:35:in
block in pack' /opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:19:inchdir' /opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:19:in
pack' /opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:20:inblock (3 levels) in <top (required)>' /opt/gitlab/embedded/bin/bundle:23:in
load' /opt/gitlab/embedded/bin/bundle:23:in `' Tasks: TOP => gitlab:backup:create
Results of GitLab environment info
Expand for output related to GitLab environment info
System information System: Proxy: no Current User: git Using RVM: no Ruby Version: 2.6.3p62 Gem Version: 2.7.9 Bundler Version:1.17.3 Rake Version: 12.3.2 Redis Version: 3.2.12 Git Version: 2.21.0 Sidekiq Version:5.2.7 Go Version: unknownGitLab information Version: 12.1.6-ee Revision: d05ee0a9c12 Directory: /opt/gitlab/embedded/service/gitlab-rails DB Adapter: PostgreSQL DB Version: 10.7 URL: https://git.seda.digital HTTP Clone URL: https://git.seda.digital/some-group/some-project.git SSH Clone URL: git@git.seda.digital:some-group/some-project.git Elasticsearch: no Geo: no Using LDAP: no Using Omniauth: yes Omniauth Providers:
GitLab Shell Version: 9.3.0 Repository storage paths:
- default: /var/opt/gitlab/git-data/repositories GitLab Shell path: /opt/gitlab/embedded/service/gitlab-shell Git: /opt/gitlab/embedded/bin/git
Results of GitLab application Check
Expand for output related to the GitLab application check
Checking GitLab subtasks ... Checking GitLab Shell ... GitLab Shell: ... GitLab Shell version >= 9.3.0 ? ... OK (9.3.0) Running /opt/gitlab/embedded/service/gitlab-shell/bin/check Check GitLab API access: OK Redis available via internal API: OK Access to /var/opt/gitlab/.ssh/authorized_keys: OK gitlab-shell self-check successful Checking GitLab Shell ... Finished Checking Gitaly ... Gitaly: ... default ... OK Checking Gitaly ... Finished Checking Sidekiq ... Sidekiq: ... Running? ... yes Number of Sidekiq processes ... 1 Checking Sidekiq ... Finished Checking Incoming Email ... Incoming Email: ... Reply by email is disabled in config/gitlab.yml Checking Incoming Email ... Finished Checking LDAP ... LDAP: ... LDAP is disabled in config/gitlab.yml Checking LDAP ... Finished Checking GitLab App ... Git configured correctly? ... yes Database config exists? ... yes All migrations up? ... yes Database contains orphaned GroupMembers? ... no GitLab config exists? ... yes GitLab config up to date? ... yes Log directory writable? ... yes Tmp directory writable? ... yes Uploads directory exists? ... yes Uploads directory has correct permissions? ... yes Uploads directory tmp has correct permissions? ... yes Init script exists? ... skipped (omnibus-gitlab has no init script) Init script up-to-date? ... skipped (omnibus-gitlab has no init script) Projects have namespace: ... Redis version >= 2.8.0? ... yes Ruby version >= 2.5.3 ? ... yes (2.6.3) Git version >= 2.21.0 ? ... yes (2.21.0) Git user has default SSH configuration? ... yes Active users: ... 10 Elasticsearch version 5.6 - 6.x? ... skipped (elasticsearch is disabled) Checking GitLab App ... Finished Checking GitLab subtasks ... Finished