20251230:20 - Restore GitLab backup using Barman

Goals

Test restoring a backup made with Barman (see 20251118:05 - Explore Barman for PostgreSQL Bac... (#260))

To accomplish this, I want to plan these out in a bit of detail, and update this issue's description with the restoration work

Having made some backups, I want to test that

I need to finish this work by testing an actual restore from my backup

Hypotheses

A Barman backup of GitLab's PostgreSQL database can be successfully restored to a specific point-in-time, both in-place and to a new server, with all database changes before the recovery point preserved and all changes after the recovery point reverted.

Setting up build/test/run apparatus

Prerequisites Checklist

  • Barman server operational with streaming archiver
  • First successful backup completed
  • WAL streaming active and verified
  • GitLab backup command configured (excluding database)
  • Test recovery location prepared

1. Review Backup Status

Verify current backup state:

# List all backups
barman list-backup gitlab-gl3
# Show detailed backup information
barman show-backup gitlab-gl3 latest
# Check backup validity
barman check gitlab-gl3
# Verify WAL continuity
barman check-backup gitlab-gl3 latest

Expected outcomes:

  • At least one DONE backup exists
  • WAL streaming is active
  • No gaps in WAL archive
  • Backup size and duration documented

Results (2025-12-30):

  • Existing backup from 2025-11-29 present (202.5 MiB + 208 MiB WAL)
  • All barman checks passed (PostgreSQL, streaming, replication slot, directories, etc.)
  • WAL streaming verified: forced pg_switch_wal() on gl3, Last available WAL incremented from 000000010000000000000020 to 000000010000000000000021

Procedure

2. Create Fresh Backup

Rationale: Systems have been offline, so create a new baseline backup before testing.

# On Barman server
barman backup gitlab-gl3
# Monitor progress
barman list-backup gitlab-gl3
watch -n 5 'barman show-backup gitlab-gl3 latest'

Verify streaming is working:

# Check replication slot activity
sudo gitlab-psql -c "SELECT slot_name, active, restart_lsn FROM pg_replication_slots WHERE slot_name = 'barman';"
# Monitor WAL generation
barman receive-wal --test gitlab-gl3

Document:

  • Backup start/end time
  • Backup size
  • WAL generation rate during backup

Results (2025-12-30):

Metric Value
Backup ID 20251230T213258
Start time 2025-12-30 21:32:58
End time 2025-12-30 21:33:40
Duration 42 seconds
Backup size 140.9 MiB (172.9 MiB with WALs)
Throughput 3.4 MiB/s
Begin LSN 0/23000028
End LSN 0/24000060

Current backup catalog:

gitlab-gl3 20251230T213258 - Size: 172.9 MiB - WAL Size: 0 B
gitlab-gl3 20251129T065859 - Size: 202.5 MiB - WAL Size: 272.0 MiB

3. Backup Non-Database Data

Run GitLab's backup excluding the database:

# On GitLab server
sudo gitlab-backup create SKIP=db
# Verify backup created
sudo ls -lh /var/opt/gitlab/backups/

What this backs up:

  • Git repositories (if not using Gitaly Cluster)
  • Uploads (avatars, attachments)
  • LFS objects
  • CI artifacts
  • Container registry images
  • Pages content
  • Terraform state
  • Package registry
  • Configuration (gitlab-secrets.json, /etc/gitlab/gitlab.rb)

Store backup location for restore testing.


4. Create Database Changes (Pre-Recovery Point)

Make changes that will be preserved after PITR:

# Create a new project
curl --request POST --header "PRIVATE-TOKEN: <your_token>"\
  --header "Content-Type: application/json"\
  --data '{"name": "pitr-test-project-1", "description": "Created before recovery point"}'\
  "https://<gitlab-url>/api/v4/projects"
# Modify existing project
curl --request PUT --header "PRIVATE-TOKEN: <your_token>"\
  --header "Content-Type: application/json"\
  --data '{"description": "Updated before recovery point"}'\
  "https://<gitlab-url>/api/v4/projects/<project_id>"
# Create an issue
curl --request POST --header "PRIVATE-TOKEN: <your_token>"\
  --header "Content-Type: application/json"\
  --data '{"title": "Test issue before recovery", "description": "This should survive PITR"}'\
  "https://<gitlab-url>/api/v4/projects/<project_id>/issues"

Verify changes in GitLab UI and document:

  • Project IDs created
  • Issue IDs created
  • Timestamps of changes

5. Mark Recovery Point in Time

Record the exact recovery point:

# On GitLab server - get current timestamp
sudo gitlab-psql -c "SELECT now();"
# Example output: 2025-12-30 14:30:00.123456+00
# Force WAL switch to ensure changes are archived
sudo gitlab-psql -c "SELECT pg_switch_wal();"
# On Barman server - verify WAL received
barman show-backup gitlab-gl3 latest

Document recovery point:

  • Recovery timestamp: 2025-12-30 14:30:00+00
  • Recovery point description: "After creating pitr-test-project-1 and test issue"

6. Create Changes to be Reverted

Make changes that should NOT exist after restore:

# Create project that will be reverted
curl --request POST --header "PRIVATE-TOKEN: <your_token>"\
  --header "Content-Type: application/json"\
  --data '{"name": "pitr-test-project-2", "description": "Created AFTER recovery point - should not exist after restore"}'\
  "https://<gitlab-url>/api/v4/projects"
# Delete or modify existing data
curl --request DELETE --header "PRIVATE-TOKEN: <your_token>"\
  "https://<gitlab-url>/api/v4/projects/<project_id>/issues/<issue_id>"
# Make configuration changes
# (Document what was changed)

Document what should be reverted:

  • Projects created after recovery point
  • Issues/MRs deleted after recovery point
  • Configuration changes made after recovery point

7. In-Place Point-in-Time Restore

⚠️ WARNING: This will overwrite the current database

Preparation:

# On GitLab server - stop all services except PostgreSQL
sudo gitlab-ctl stop
sudo gitlab-ctl start postgresql
# Verify no connections to database
sudo gitlab-psql -c "SELECT count(*) FROM pg_stat_activity WHERE datname = 'gitlabhq_production';"

Perform PITR:

# On Barman server - restore to recovery point
barman recover\
  --target-time "2025-12-30 14:30:00+00"\
  --remote-ssh-command "ssh gitlab-psql@<GITLAB_SERVER_IP>"\
  gitlab-gl3 latest\
  /var/opt/gitlab/postgresql/data
# Alternative: Restore to local directory first, then rsync
barman recover\
  --target-time "2025-12-30 14:30:00+00"\
  gitlab-gl3 latest\
  /var/lib/barman/recover/gitlab-gl3
# Then on Barman server
rsync -avz --delete\
  /var/lib/barman/recover/gitlab-gl3/\
  gitlab-psql@<GITLAB_SERVER_IP>:/var/opt/gitlab/postgresql/data/

On GitLab server - restart services:

# Fix permissions
sudo chown -R gitlab-psql:gitlab-psql /var/opt/gitlab/postgresql/data
# Start PostgreSQL
sudo gitlab-ctl start postgresql
# Verify database is accessible
sudo gitlab-psql -c "SELECT version();"
# Start all services
sudo gitlab-ctl start

Verification:

  • pitr-test-project-1 exists
  • Test issue created before recovery point exists
  • pitr-test-project-2 does NOT exist
  • Deleted issue is restored
  • Configuration changes are reverted
  • GitLab UI is accessible and functional

Document:

  • Restore duration
  • Any errors encountered
  • Verification results

8. Restore to New Server

Purpose: Validate disaster recovery to completely new infrastructure.

Provision new GitLab server:

# Create new VM with same specs as original
# Install GitLab (same version)
# Do NOT run gitlab-ctl reconfigure yet

Restore database:

# On Barman server
barman recover\
  --remote-ssh-command "ssh gitlab-psql@<NEW_SERVER_IP>"\
  gitlab-gl3 latest\
  /var/opt/gitlab/postgresql/data

Restore non-database data:

# On new GitLab server
# Copy backup file from original server
sudo gitlab-backup restore BACKUP=<timestamp>

Restore configuration:

# Copy from original server
scp <original-server>:/etc/gitlab/gitlab-secrets.json /etc/gitlab/
scp <original-server>:/etc/gitlab/gitlab.rb /etc/gitlab/
# Reconfigure
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart

Verification:

  • All projects accessible
  • All issues/MRs present
  • Users can authenticate
  • CI/CD pipelines can run
  • Git operations work (clone, push, pull)
  • API endpoints respond correctly

Document:

  • Total restore time (database + application data)
  • Any configuration adjustments needed
  • Differences from original server

Success Criteria

  • Fresh backup completed successfully
  • Non-database backup created
  • Database changes streamed to Barman (verified in WAL)
  • Recovery point marked and documented
  • In-place PITR completed successfully
  • Changes after recovery point successfully reverted
  • Changes before recovery point preserved
  • New server restore completed successfully
  • All verification checks passed
  • Performance metrics documented
  • Lessons learned documented

Performance Metrics to Document

Metric Value Notes
Database size 141.6 MiB
Fresh backup duration 42-43 seconds
Backup size on disk 173.6 MiB with WALs
WAL generation rate 9.54/hour during active use
In-place restore duration 9-11 seconds database only
New server restore duration ~3 seconds database rsync
Total downtime (in-place) ~2-3 minutes stop, restore, start
Total recovery time (new server) ~10 minutes including gitlab-backup restore

Lessons Learned Section

In-Place Restore (Step 7)

PITR with --target-time Issues:

  • PITR to specific timestamps may fail if WAL files are not fully copied to the barman_wal/ directory during restore
  • Barman's show-backup reported WAL files up to 2D, but only 29, 2A, 2B were copied to barman_wal/
  • Error: cp: cannot stat 'barman_wal/00000001000000000000002C': No such file or directory
  • Workaround: Restore without --target-time to recover to last available WAL - this worked reliably

SSH Key Setup Required:

  • The gitlab-psql user is a system user without a password
  • Must manually set up SSH key authentication from barman user to gitlab-psql on target server
  • Create /var/opt/gitlab/postgresql/.ssh/authorized_keys with barman's public key
  • Set permissions: .ssh (700), authorized_keys (600), owned by gitlab-psql:gitlab-psql

New Server Restore (Step 8)

Do NOT copy gitlab.rb directly:

  • GET (GitLab Environment Toolkit) creates unique block storage buckets per server
  • Copying gitlab.rb from source to target will point to wrong buckets
  • Instead, manually add only the Barman-specific configuration to the new server's gitlab.rb:
    • TCP listening for PostgreSQL
    • Replication settings
    • Access control for Barman

pg_ident.conf Usermap Mismatch:

  • After rsync of database from Barman, PostgreSQL authentication fails with: no match in usermap "gitlab" for user "gitlab" authenticated as "root"
  • The restored pg_ident.conf from source server doesn't match the new server's OS user mappings
  • Fix: Run gitlab-ctl reconfigure after the database is restored and running - this regenerates correct pg_ident.conf mappings

Recommended Restore Order for New Server:

  1. Install GitLab (same version) on new server
  2. Copy gitlab-secrets.json from source server
  3. Add Barman configuration to gitlab.rb (don't copy entire file)
  4. Run gitlab-ctl reconfigure
  5. Stop puma and sidekiq
  6. Restore non-database backup: gitlab-backup restore BACKUP=<timestamp> (with SKIP=db backup, database unchanged)
  7. Set up SSH key for barman user to access gitlab-psql on new server
  8. Stop all GitLab services
  9. Restore database using two-stage method:
    • On Barman: barman recover gitlab-gl3 latest /tmp/barman/recover/gitlab-gl4
    • Rsync to new server: rsync -avz --delete /tmp/barman/recover/gitlab-gl4/ gitlab-psql@<NEW_IP>:/var/opt/gitlab/postgresql/data/
  10. Start PostgreSQL, verify it recovers
  11. Run gitlab-ctl reconfigure again (fixes pg_ident.conf)
  12. Start all services

Two-Stage Restore Method (Recommended):

  • Restore to local directory on Barman server first, then rsync to target
  • More reliable than direct --remote-ssh-command method
  • Allows inspection of restored files before copying to target
  • Rsync is faster for subsequent restores (incremental)

General Recommendations

  • Always use archive_mode = 'off' when using streaming replication (not WAL shipping)
  • Use barman cron to automatically manage receive-wal processes
  • Test restores regularly - the process has multiple steps that can fail
  • Document the exact recovery timestamp when marking recovery points
  • Force WAL switch (pg_switch_wal()) after making changes to ensure they're archived

Edited by Mike Lockhart | GitLab