Skip to content

gitlab-backup is not fully utilising environment variable overrides

Summary

We document the use of environment variable overrides to allow gitlab-backup (and restore) to bypass the configuration provided in GitLab configuration (database.yml).

Whilst !129605 (merged) has allowed overrides to work for the database dump via pg_dump, the dump repositories stage (and possibly other backup stages?) do not appear to make use of the override values.

We specifically state that we should not back up or restore GitLab via PgBouncer, however as the environment variable overrides appear to be ignored for at least part of the backup, this can lead to failures, such as idle connection timeouts.

Steps to reproduce

  1. Deploy and configure GitLab to use pgbouncer
  2. Initiate gitlab-backup create using override variables (see example below)
  3. Observe TCP connections and note that the overridden host and port are used only for pg_dump, for example:
GITLAB_BACKUP_PGHOST=172.31.25.227 GITLAB_BACKUP_PGPORT=5432 gitlab-backup create

### Below we see `pg_dump` is using the correct overridden values however the rake task using database.yml configured values:
$ netstat -anop | grep "[5,6]432"
tcp        0      0 172.31.16.131:42992     172.31.34.138:6432      TIME_WAIT   -                    timewait (48.80/0/0)
tcp        0      0 172.31.16.131:43008     172.31.34.138:6432      TIME_WAIT   -                    timewait (48.80/0/0)
tcp        0      0 172.31.16.131:49066     172.31.34.138:6432      TIME_WAIT   -                    timewait (50.96/0/0)
tcp   1013925      0 172.31.16.131:52148     172.31.25.227:5432      ESTABLISHED 468261/pg_dump       keepalive (294.44/0/0)    <<<< postgres node/5432
tcp        0      0 172.31.16.131:42980     172.31.34.138:6432      TIME_WAIT   -                    timewait (47.58/0/0)
tcp        0      0 172.31.16.131:49082     172.31.34.138:6432      ESTABLISHED 468047/rake gitlab:  keepalive (294.38/0/0)     <<<< pgbouncer node/6432
tcp        0      0 172.31.16.131:49072     172.31.34.138:6432      TIME_WAIT   -                    timewait (50.96/0/0)

What is the current bug behavior?

Whilst the pg_dump completes successfully, the backup (dump repositories) may fail when gitlab-backup ignores the overrides and attempts to access a stale (timed out) connection.

What is the expected correct behavior?

All aspects of gitlab-backup (and restore), should utilise the environment variable overrides, and the dump/restore complete successfully.

Relevant logs and/or screenshots

$ GITLAB_BACKUP_PGHOST=<IP/Address of Postgres node> GITLAB_BACKUP_PGPORT=5432 gitlab-backup create
2024-05-28 12:44:04 UTC -- Dumping database ...
Dumping PostgreSQL database gitlabhq_production ... [DONE]
2024-05-28 13:00:06 UTC -- Dumping database ... done
2024-05-28 13:00:06 UTC -- Dumping repositories ...
2024-05-28 13:00:06 UTC -- Deleting tar staging files ...
2024-05-28 13:00:06 UTC -- Cleaning up /var/opt/gitlab/backups/db
2024-05-28 13:00:09 UTC -- Deleting tar staging files ... done
2024-05-28 13:00:09 UTC -- Deleting backups/tmp ...
2024-05-28 13:00:09 UTC -- Deleting backups/tmp ... done
2024-05-28 13:00:09 UTC -- Deleting backup and restore PID file ... done
rake aborted!
ActiveRecord::StatementInvalid: PG::ConnectionBad: PQconsumeInput() SSL SYSCALL error: EOF detected    # <<< In the case of a TLS database connection
ActiveRecord::StatementInvalid: PG::ConnectionBad: PQconsumeInput() FATAL:  client_idle_timeout        # <<< For non-TLS
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/schema_cache_with_renamed_table_legacy.rb:27:in `columns'
/opt/gitlab/embedded/service/gitlab-rails/lib/gitlab/database/schema_cache_with_renamed_table_legacy.rb:31:in `columns_hash'
/opt/gitlab/embedded/service/gitlab-rails/ee/lib/ee/backup/targets/repositories.rb:12:in `group_relation'
/opt/gitlab/embedded/service/gitlab-rails/ee/lib/ee/backup/targets/repositories.rb:19:in `find_groups_in_batches'
/opt/gitlab/embedded/service/gitlab-rails/ee/lib/ee/backup/targets/repositories.rb:37:in `enqueue_consecutive_groups'
/opt/gitlab/embedded/service/gitlab-rails/ee/lib/ee/backup/targets/repositories.rb:31:in `enqueue_consecutive'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/targets/repositories.rb:29:in `dump'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/tasks/task.rb:25:in `backup!'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:53:in `run_create_task'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:142:in `block in run_all_create_tasks'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:142:in `each_value'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:142:in `run_all_create_tasks'
/opt/gitlab/embedded/service/gitlab-rails/lib/backup/manager.rb:28:in `create'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:12:in `block in create_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:68:in `lock_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:10:in `create_backup'
/opt/gitlab/embedded/service/gitlab-rails/lib/tasks/gitlab/backup.rake:107:in `block (3 levels) in <top (required)>'
/opt/gitlab/embedded/bin/bundle:25:in `load'
/opt/gitlab/embedded/bin/bundle:25:in `<main>'

Results of GitLab environment info

Expand for output related to GitLab environment info

GitLab information
Version:        16.11.2-ee
Revision:       d210b947e3e
Directory:      /opt/gitlab/embedded/service/gitlab-rails
DB Adapter:     PostgreSQL
DB Version:     14.11

Edited by Chris Stone