Skip to content

Fail loud when the database backup failed

Chiller Dragon requested to merge ChillerDragon/gitlab:mr_backup_error into master

What does this MR do and why?

Fail loud when the database backup failed

Related to #411497 (closed)

Currently the backup command exits with code 0 and creates a backup file including all repositories even if the connection to the database failed and no database dump was created.

Since creating a backup is commonly an automated process. It would help the backup health monitoring to see via the exit code if the backup was created successfully. Also it does not seem to make a lot of sense to continue backing up all repository data if the backup can not be loaded anyways due to a missing database. It just takes time and creates a misleadingly big backup file. Misleading because the backup is incomplete.

There is a small note in the in the beginning of the backup log that the dump failed. That note also never shows a [FAILED] from this method

def report_success(success)
  if success
    progress.puts '[DONE]'.color(:green)
  else
    progress.puts '[FAILED]'.color(:red)
  end
end

because the raise is triggered before report_sucess is called.

raise DatabaseBackupError.new(config, db_file_name) unless success

report_success(success)
progress.flush

This command no longer silently catches the dump error and fails fast and loud. To gurantee that a passed backup command creates a functional backup.

sudo -u git -H bundle exec rake gitlab:backup:create RAILS_ENV=production

Screenshots or screen recordings

It is not a visual change. You see the difference when running the backup command. And looking at the output log and exit code. I put the full before and after logs further down in a spoiler.

Before After
gitlab-rake gitlab:backup:create exit code 0 on db error gitlab-rake gitlab:backup:create exit code 1 on db error

How to set up and validate locally

The issue I ran into was a external database having a version miss match. Gitlab runs fine but the backup fails because pg_dump is in the wrong version. So using these two docker compose files you can reproduce a failing database backup.

My setup might be more complicated than yours so you can also just run the backuo command on a gitlab instance with a broken database. And then apply my commit and see how it changes the outcome.

But if you want to reproduce my testing setup here it is.

$ docker --version
Docker version 24.0.2, build cb74dfcd85
$ docker-compose --version
Docker Compose version 2.18.1
# gitlab-db/docker-compose.yml
version: '3'

services:
  gitlab-db14:
    # gitlab is not compatible with postgres 14
    # this is why the backup will fail
    image: postgres:14.7-alpine
    volumes:
     - ./db14:/var/lib/postgresql/data:rw
    environment:
      - POSTGRES_DB=gitlabhq_production
      - POSTGRES_USER=gitlab
      - POSTGRES_PASSWORD=secure_pg_pass123
    ports:
     - 5432:5432
    expose:
     - 5432
    healthcheck:
      test: ["CMD-SHELL", "PGPASSWORD=secure_pg_pass123 psql -h gitlab-db14 -U gitlab gitlabhq_production -c '\\l'"]
      interval: 5s
      timeout: 5s
      retries: 3
    networks:
        - gitlab-databases

networks:
  gitlab-databases:
# gitlab/docker-compose.yml
version: '3.6'
services:
  web:
    image: 'gitlab/gitlab-ce:16.1.0-ce.0'
    restart: always
    hostname: 'test-gitlab.mydomain.com'
    environment:
      GITLAB_OMNIBUS_CONFIG: |
        external_url 'https://test-gitlab.mydomain.com'
        nginx['ssl_certificate'] = "/etc/gitlab/ssl/test-gitlab.mydomain.com.cer"
        nginx['ssl_certificate_key'] = "/etc/gitlab/ssl/test-gitlab.mydomain.com.key"
        letsencrypt['enable'] = false
        gitlab_rails['manage_backup_path'] = false
        gitlab_rails['backup_path'] = '/backups'

        postgresql['enable'] = false
        gitlab_rails['db_adapter'] = 'postgresql'
        gitlab_rails['db_encoding'] = 'utf8'
        gitlab_rails['db_host'] = 'gitlab-db14'
        gitlab_rails['db_port'] = 5432
        gitlab_rails['db_username'] = 'gitlab'
        gitlab_rails['db_password'] = "secure_pg_pass123"
    extra_hosts:
      - "test-gitlab.mydomain.com:127.0.0.1"
    ports:
      - '192.168.178.29:22:22'
      - '192.168.178.29:80:80'
      - '192.168.178.29:443:443'
    volumes:
      - './data/backups:/backups'
      - './data/logs:/var/log/gitlab'
      - './data/data:/var/opt/gitlab'
      - './data/config:/etc/gitlab'
      - './certs:/etc/gitlab/ssl/'
    networks:
      - gitlab-databases
    shm_size: '256m'

networks:
  gitlab-databases:
    external:
      name: gitlab-db_gitlab-databases

For this to work I changed test-gitlab.mydomain.com to my actual domain and made it point to the local ip of my laptop (192.168.178.29). And then created some ssl certificates for the domain and put them in the certs/ folder.

So the final result looks like this:

$ tree
.
├── gitlab
│   ├── certs
│   │   ├── ca.cer
│   │   ├── fullchain.cer
│   │   ├── test-gitlab.mydomain.com.cer
│   │   ├── test-gitlab.mydomain.com.conf
│   │   ├── test-gitlab.mydomain.com.csr
│   │   ├── test-gitlab.mydomain.com.csr.conf
│   │   └── test-gitlab.mydomain.com.key
│   └── docker-compose.yml
└── gitlab-db
    └── docker-compose.yml

4 directories, 9 files

Then start the db. Then gitlab. Wait for it to launch. Create a backup. And see it failing but exiting with code 0.

cd gitlab-db
docker-compose up -d --wait
cd ../gitlab
docker-compose up -d --wait

# this command shows the db error at the top but continues as if nothing had happend
docker exec -i gitlab-web-1 gitlab-rake gitlab:backup:create

Now to test my patch you can download the files from my commit and mount them into the container

cd gitlab
docker-compose down
mkdir patch
cd patch
wget https://gitlab.com/ChillerDragon/gitlab/-/raw/1321aa05e66b6b10f8878401f7b494008eab51c9/lib/backup/manager.rb
wget https://gitlab.com/ChillerDragon/gitlab/-/raw/1321aa05e66b6b10f8878401f7b494008eab51c9/lib/backup/database.rb

Now add this in the volumes section of the gitlab/docker-compose.yml

- './patch/database.rb:/opt/gitlab/embedded/service/gitlab-rails/lib/backup/database.rb'
- './patch/manager.rb:/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb'
docker-compose up -d --wait

# this will now stop on database dump failure
# and exit with code 1
docker exec -i gitlab-web-1 gitlab-rake gitlab:backup:create

before it did continue and create a backup archive with all repositories:

[user@host gitlab]$ docker exec -i gitlab-web-1 gitlab-rake gitlab:backup:create SKIP=registry,artifacts,builds,pages
2023-06-22 15:33:15 +0200 -- Dumping database ... 
Dumping PostgreSQL database gitlabhq_production ... pg_dump: error: server version: 14.7; pg_dump version: 13.11
pg_dump: error: aborting because of server version mismatch
2023-06-22 15:33:15 +0200 -- Dumping database failed: Failed to create compressed file '/backups/db/database.sql.gz' when trying to backup the main database:
 - host: 'gitlab-db14'
 - port: '5432'
 - database: 'gitlabhq_production'
2023-06-22 15:33:15 +0200 -- Dumping repositories ... 
{"command":"create","gl_project_path":"gitlab-instance-513de534/Monitoring","level":"info","msg":"started create","relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git","storage_name":"default","time":"2023-06-22T13:33:15.535Z"}
{"command":"create","error":"manager: repository empty: repository skipped","gl_project_path":"gitlab-instance-513de534/Monitoring","level":"warning","msg":"skipped create","relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.git","storage_name":"default","time":"2023-06-22T13:33:15.539Z"}
{"command":"create","gl_project_path":"gitlab-instance-513de534/Monitoring.wiki","level":"info","msg":"started create","relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.wiki.git","storage_name":"default","time":"2023-06-22T13:33:15.635Z"}
{"command":"create","error":"manager: repository empty: repository skipped","gl_project_path":"gitlab-instance-513de534/Monitoring.wiki","level":"warning","msg":"skipped create","relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.wiki.git","storage_name":"default","time":"2023-06-22T13:33:15.637Z"}
{"command":"create","gl_project_path":"gitlab-instance-513de534/Monitoring","level":"info","msg":"started create","relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.design.git","storage_name":"default","time":"2023-06-22T13:33:15.671Z"}
{"command":"create","error":"manager: repository empty: repository skipped","gl_project_path":"gitlab-instance-513de534/Monitoring","level":"warning","msg":"skipped create","relative_path":"@hashed/6b/86/6b86b273ff34fce19d6b804eff5a3f5747ada4eaa22f1d49c01e52ddb7875b4b.design.git","storage_name":"default","time":"2023-06-22T13:33:15.671Z"}
{"command":"create","gl_project_path":"my.user/postgresmomemt","level":"info","msg":"started create","relative_path":"@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git","storage_name":"default","time":"2023-06-22T13:33:15.673Z"}
{"command":"create","gl_project_path":"my.user/postgresmomemt.wiki","level":"info","msg":"started create","relative_path":"@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.wiki.git","storage_name":"default","time":"2023-06-22T13:33:15.676Z"}
{"command":"create","error":"manager: repository empty: repository skipped","gl_project_path":"my.user/postgresmomemt.wiki","level":"warning","msg":"skipped create","relative_path":"@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.wiki.git","storage_name":"default","time":"2023-06-22T13:33:15.677Z"}
{"command":"create","gl_project_path":"my.user/postgresmomemt","level":"info","msg":"started create","relative_path":"@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.design.git","storage_name":"default","time":"2023-06-22T13:33:15.679Z"}
{"command":"create","error":"manager: repository empty: repository skipped","gl_project_path":"my.user/postgresmomemt","level":"warning","msg":"skipped create","relative_path":"@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.design.git","storage_name":"default","time":"2023-06-22T13:33:15.680Z"}
{"command":"create","gl_project_path":"my.user/postgresmomemt","level":"info","msg":"completed create","relative_path":"@hashed/d4/73/d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35.git","storage_name":"default","time":"2023-06-22T13:33:15.681Z"}
2023-06-22 15:33:15 +0200 -- Dumping repositories ... done
2023-06-22 15:33:15 +0200 -- Dumping uploads ... 
2023-06-22 15:33:15 +0200 -- Dumping uploads ... done
2023-06-22 15:33:15 +0200 -- Dumping builds ... [SKIPPED]
2023-06-22 15:33:15 +0200 -- Dumping artifacts ... [SKIPPED]
2023-06-22 15:33:15 +0200 -- Dumping pages ... [SKIPPED]
2023-06-22 15:33:15 +0200 -- Dumping lfs objects ... 
2023-06-22 15:33:15 +0200 -- Dumping lfs objects ... done
2023-06-22 15:33:15 +0200 -- Dumping terraform states ... 
2023-06-22 15:33:15 +0200 -- Dumping terraform states ... done
2023-06-22 15:33:15 +0200 -- Dumping container registry images ... [SKIPPED]
2023-06-22 15:33:15 +0200 -- Dumping packages ... 
2023-06-22 15:33:15 +0200 -- Dumping packages ... done
2023-06-22 15:33:15 +0200 -- Creating backup archive: 1687440795_2023_06_22_15.11.8_gitlab_backup.tar ... 
2023-06-22 15:33:15 +0200 -- Creating backup archive: 1687440795_2023_06_22_15.11.8_gitlab_backup.tar ... done
2023-06-22 15:33:15 +0200 -- Uploading backup archive to remote storage  ... [SKIPPED]
2023-06-22 15:33:15 +0200 -- Deleting old backups ... 
2023-06-22 15:33:15 +0200 -- Deleting old backups ... done. (0 removed)
2023-06-22 15:33:15 +0200 -- Deleting tar staging files ... 
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/backup_information.yml
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/db
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/repositories
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/uploads.tar.gz
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/lfs.tar.gz
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/terraform_state.tar.gz
2023-06-22 15:33:15 +0200 -- Cleaning up /backups/packages.tar.gz
2023-06-22 15:33:15 +0200 -- Deleting tar staging files ... done
2023-06-22 15:33:15 +0200 -- Deleting backups/tmp ... 
2023-06-22 15:33:15 +0200 -- Deleting backups/tmp ... done
2023-06-22 15:33:15 +0200 -- Warning: Your gitlab.rb and gitlab-secrets.json files contain sensitive data 
and are not included in this backup. You will need these files to restore a backup.
Please back them up manually.
2023-06-22 15:33:15 +0200 -- Backup 1687440795_2023_06_22_15.11.8 is done.
2023-06-22 13:33:15 +0000 -- Deleting backup and restore lock file
[user@host gitlab]$ echo $?
0

after it fails fast and loud:

[user@host gitlab]$ docker exec -i gitlab-web-1 gitlab-rake gitlab:backup:create SKIP=registry,artifacts,builds,pages
2023-06-26 09:47:57 +0200 -- Dumping database ... 
Dumping PostgreSQL database gitlabhq_production ... pg_dump: error: server version: 14.7; pg_dump version: 13.11
pg_dump: error: aborting because of server version mismatch
[FAILED]
rake aborted!
Backup::Error: Dumping database failed: Failed to create compressed file '/backups/db/database.sql.gz' when trying to backup the main database:
 - host: 'gitlab-db14'
 - port: '5432'
 - database: 'gitlabhq_production'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:75:in `rescue in run_create_task'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:55:in `run_create_task'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:222:in `block in run_all_create_tasks'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:221:in `each_key'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:221:in `run_all_create_tasks'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:47:in `create'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:13:in `block in create_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:62:in `lock_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:10:in `create_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:101:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:25:in `load'
/opt/gitlab/embedded/bin/bundle:25:in `<main>'

Caused by:
Backup::DatabaseBackupError: Failed to create compressed file '/backups/db/database.sql.gz' when trying to backup the main database:
 - host: 'gitlab-db14'
 - port: '5432'
 - database: 'gitlabhq_production'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/database.rb:63:in `block in dump'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/database.rb:277:in `each'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/database.rb:277:in `each_database_snapshot_id'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/database.rb:30:in `dump'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:71:in `run_create_task'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:222:in `block in run_all_create_tasks'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:221:in `each_key'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:221:in `run_all_create_tasks'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:47:in `create'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:13:in `block in create_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:62:in `lock_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:10:in `create_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:101:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:25:in `load'
/opt/gitlab/embedded/bin/bundle:25:in `<main>'
Tasks: TOP => gitlab:backup:create
(See full trace by running task with --trace)
2023-06-26 09:47:57 +0200 -- Deleting tar staging files ... 
2023-06-26 09:47:57 +0200 -- Cleaning up /backups/db
2023-06-26 09:47:57 +0200 -- Deleting tar staging files ... done
2023-06-26 09:47:57 +0200 -- Deleting backups/tmp ... 
2023-06-26 09:47:57 +0200 -- Deleting backups/tmp ... done
2023-06-26 09:47:57 +0200 -- Deleting backup and restore PID file ... done
[user@host gitlab]$ echo $?
1

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Michael Kozono

Merge request reports