20251230:20 - Restore GitLab backup using Barman
Goals
Test restoring a backup made with Barman (see 20251118:05 - Explore Barman for PostgreSQL Bac... (#260))
To accomplish this, I want to plan these out in a bit of detail, and update this issue's description with the restoration work
Having made some backups, I want to test that
I need to finish this work by testing an actual restore from my backup
Hypotheses
A Barman backup of GitLab's PostgreSQL database can be successfully restored to a specific point-in-time, both in-place and to a new server, with all database changes before the recovery point preserved and all changes after the recovery point reverted.
Setting up build/test/run apparatus
Prerequisites Checklist
- Barman server operational with streaming archiver
- First successful backup completed
- WAL streaming active and verified
- GitLab backup command configured (excluding database)
- Test recovery location prepared
1. Review Backup Status
Verify current backup state:
# List all backups
barman list-backup gitlab-gl3
# Show detailed backup information
barman show-backup gitlab-gl3 latest
# Check backup validity
barman check gitlab-gl3
# Verify WAL continuity
barman check-backup gitlab-gl3 latest
Expected outcomes:
- At least one DONE backup exists
- WAL streaming is active
- No gaps in WAL archive
- Backup size and duration documented
Results (2025-12-30):
-
✅ Existing backup from 2025-11-29 present (202.5 MiB + 208 MiB WAL) -
✅ All barman checks passed (PostgreSQL, streaming, replication slot, directories, etc.) -
✅ WAL streaming verified: forcedpg_switch_wal()on gl3, Last available WAL incremented from000000010000000000000020to000000010000000000000021
Procedure
2. Create Fresh Backup
Rationale: Systems have been offline, so create a new baseline backup before testing.
# On Barman server
barman backup gitlab-gl3
# Monitor progress
barman list-backup gitlab-gl3
watch -n 5 'barman show-backup gitlab-gl3 latest'
Verify streaming is working:
# Check replication slot activity
sudo gitlab-psql -c "SELECT slot_name, active, restart_lsn FROM pg_replication_slots WHERE slot_name = 'barman';"
# Monitor WAL generation
barman receive-wal --test gitlab-gl3
Document:
- Backup start/end time
- Backup size
- WAL generation rate during backup
Results (2025-12-30):
| Metric | Value |
|---|---|
| Backup ID | 20251230T213258 |
| Start time | 2025-12-30 21:32:58 |
| End time | 2025-12-30 21:33:40 |
| Duration | 42 seconds |
| Backup size | 140.9 MiB (172.9 MiB with WALs) |
| Throughput | 3.4 MiB/s |
| Begin LSN | 0/23000028 |
| End LSN | 0/24000060 |
Current backup catalog:
gitlab-gl3 20251230T213258 - Size: 172.9 MiB - WAL Size: 0 B
gitlab-gl3 20251129T065859 - Size: 202.5 MiB - WAL Size: 272.0 MiB
3. Backup Non-Database Data
Run GitLab's backup excluding the database:
# On GitLab server
sudo gitlab-backup create SKIP=db
# Verify backup created
sudo ls -lh /var/opt/gitlab/backups/
What this backs up:
- Git repositories (if not using Gitaly Cluster)
- Uploads (avatars, attachments)
- LFS objects
- CI artifacts
- Container registry images
- Pages content
- Terraform state
- Package registry
- Configuration (
gitlab-secrets.json,/etc/gitlab/gitlab.rb)
Store backup location for restore testing.
4. Create Database Changes (Pre-Recovery Point)
Make changes that will be preserved after PITR:
# Create a new project
curl --request POST --header "PRIVATE-TOKEN: <your_token>"\
--header "Content-Type: application/json"\
--data '{"name": "pitr-test-project-1", "description": "Created before recovery point"}'\
"https://<gitlab-url>/api/v4/projects"
# Modify existing project
curl --request PUT --header "PRIVATE-TOKEN: <your_token>"\
--header "Content-Type: application/json"\
--data '{"description": "Updated before recovery point"}'\
"https://<gitlab-url>/api/v4/projects/<project_id>"
# Create an issue
curl --request POST --header "PRIVATE-TOKEN: <your_token>"\
--header "Content-Type: application/json"\
--data '{"title": "Test issue before recovery", "description": "This should survive PITR"}'\
"https://<gitlab-url>/api/v4/projects/<project_id>/issues"
Verify changes in GitLab UI and document:
- Project IDs created
- Issue IDs created
- Timestamps of changes
5. Mark Recovery Point in Time
Record the exact recovery point:
# On GitLab server - get current timestamp
sudo gitlab-psql -c "SELECT now();"
# Example output: 2025-12-30 14:30:00.123456+00
# Force WAL switch to ensure changes are archived
sudo gitlab-psql -c "SELECT pg_switch_wal();"
# On Barman server - verify WAL received
barman show-backup gitlab-gl3 latest
Document recovery point:
-
Recovery timestamp:
2025-12-30 14:30:00+00 - Recovery point description: "After creating pitr-test-project-1 and test issue"
6. Create Changes to be Reverted
Make changes that should NOT exist after restore:
# Create project that will be reverted
curl --request POST --header "PRIVATE-TOKEN: <your_token>"\
--header "Content-Type: application/json"\
--data '{"name": "pitr-test-project-2", "description": "Created AFTER recovery point - should not exist after restore"}'\
"https://<gitlab-url>/api/v4/projects"
# Delete or modify existing data
curl --request DELETE --header "PRIVATE-TOKEN: <your_token>"\
"https://<gitlab-url>/api/v4/projects/<project_id>/issues/<issue_id>"
# Make configuration changes
# (Document what was changed)
Document what should be reverted:
- Projects created after recovery point
- Issues/MRs deleted after recovery point
- Configuration changes made after recovery point
7. In-Place Point-in-Time Restore
Preparation:
# On GitLab server - stop all services except PostgreSQL
sudo gitlab-ctl stop
sudo gitlab-ctl start postgresql
# Verify no connections to database
sudo gitlab-psql -c "SELECT count(*) FROM pg_stat_activity WHERE datname = 'gitlabhq_production';"
Perform PITR:
# On Barman server - restore to recovery point
barman recover\
--target-time "2025-12-30 14:30:00+00"\
--remote-ssh-command "ssh gitlab-psql@<GITLAB_SERVER_IP>"\
gitlab-gl3 latest\
/var/opt/gitlab/postgresql/data
# Alternative: Restore to local directory first, then rsync
barman recover\
--target-time "2025-12-30 14:30:00+00"\
gitlab-gl3 latest\
/var/lib/barman/recover/gitlab-gl3
# Then on Barman server
rsync -avz --delete\
/var/lib/barman/recover/gitlab-gl3/\
gitlab-psql@<GITLAB_SERVER_IP>:/var/opt/gitlab/postgresql/data/
On GitLab server - restart services:
# Fix permissions
sudo chown -R gitlab-psql:gitlab-psql /var/opt/gitlab/postgresql/data
# Start PostgreSQL
sudo gitlab-ctl start postgresql
# Verify database is accessible
sudo gitlab-psql -c "SELECT version();"
# Start all services
sudo gitlab-ctl start
Verification:
- pitr-test-project-1 exists
- Test issue created before recovery point exists
- pitr-test-project-2 does NOT exist
- Deleted issue is restored
- Configuration changes are reverted
- GitLab UI is accessible and functional
Document:
- Restore duration
- Any errors encountered
- Verification results
8. Restore to New Server
Purpose: Validate disaster recovery to completely new infrastructure.
Provision new GitLab server:
# Create new VM with same specs as original
# Install GitLab (same version)
# Do NOT run gitlab-ctl reconfigure yet
Restore database:
# On Barman server
barman recover\
--remote-ssh-command "ssh gitlab-psql@<NEW_SERVER_IP>"\
gitlab-gl3 latest\
/var/opt/gitlab/postgresql/data
Restore non-database data:
# On new GitLab server
# Copy backup file from original server
sudo gitlab-backup restore BACKUP=<timestamp>
Restore configuration:
# Copy from original server
scp <original-server>:/etc/gitlab/gitlab-secrets.json /etc/gitlab/
scp <original-server>:/etc/gitlab/gitlab.rb /etc/gitlab/
# Reconfigure
sudo gitlab-ctl reconfigure
sudo gitlab-ctl restart
Verification:
- All projects accessible
- All issues/MRs present
- Users can authenticate
- CI/CD pipelines can run
- Git operations work (clone, push, pull)
- API endpoints respond correctly
Document:
- Total restore time (database + application data)
- Any configuration adjustments needed
- Differences from original server
Success Criteria
- Fresh backup completed successfully
- Non-database backup created
- Database changes streamed to Barman (verified in WAL)
- Recovery point marked and documented
- In-place PITR completed successfully
- Changes after recovery point successfully reverted
- Changes before recovery point preserved
- New server restore completed successfully
- All verification checks passed
- Performance metrics documented
- Lessons learned documented
Performance Metrics to Document
| Metric | Value | Notes |
|---|---|---|
| Database size | 141.6 MiB | |
| Fresh backup duration | 42-43 seconds | |
| Backup size on disk | 173.6 MiB | with WALs |
| WAL generation rate | 9.54/hour | during active use |
| In-place restore duration | 9-11 seconds | database only |
| New server restore duration | ~3 seconds | database rsync |
| Total downtime (in-place) | ~2-3 minutes | stop, restore, start |
| Total recovery time (new server) | ~10 minutes | including gitlab-backup restore |
Lessons Learned Section
In-Place Restore (Step 7)
PITR with --target-time Issues:
- PITR to specific timestamps may fail if WAL files are not fully copied to the
barman_wal/directory during restore - Barman's
show-backupreported WAL files up to 2D, but only 29, 2A, 2B were copied tobarman_wal/ - Error:
cp: cannot stat 'barman_wal/00000001000000000000002C': No such file or directory -
Workaround: Restore without
--target-timeto recover to last available WAL - this worked reliably
SSH Key Setup Required:
- The
gitlab-psqluser is a system user without a password - Must manually set up SSH key authentication from barman user to gitlab-psql on target server
- Create
/var/opt/gitlab/postgresql/.ssh/authorized_keyswith barman's public key - Set permissions:
.ssh(700),authorized_keys(600), owned bygitlab-psql:gitlab-psql
New Server Restore (Step 8)
Do NOT copy gitlab.rb directly:
- GET (GitLab Environment Toolkit) creates unique block storage buckets per server
- Copying
gitlab.rbfrom source to target will point to wrong buckets - Instead, manually add only the Barman-specific configuration to the new server's
gitlab.rb:- TCP listening for PostgreSQL
- Replication settings
- Access control for Barman
pg_ident.conf Usermap Mismatch:
- After rsync of database from Barman, PostgreSQL authentication fails with:
no match in usermap "gitlab" for user "gitlab" authenticated as "root" - The restored
pg_ident.conffrom source server doesn't match the new server's OS user mappings -
Fix: Run
gitlab-ctl reconfigureafter the database is restored and running - this regenerates correctpg_ident.confmappings
Recommended Restore Order for New Server:
- Install GitLab (same version) on new server
- Copy
gitlab-secrets.jsonfrom source server - Add Barman configuration to
gitlab.rb(don't copy entire file) - Run
gitlab-ctl reconfigure - Stop puma and sidekiq
- Restore non-database backup:
gitlab-backup restore BACKUP=<timestamp>(withSKIP=dbbackup, database unchanged) - Set up SSH key for barman user to access gitlab-psql on new server
- Stop all GitLab services
- Restore database using two-stage method:
- On Barman:
barman recover gitlab-gl3 latest /tmp/barman/recover/gitlab-gl4 - Rsync to new server:
rsync -avz --delete /tmp/barman/recover/gitlab-gl4/ gitlab-psql@<NEW_IP>:/var/opt/gitlab/postgresql/data/
- On Barman:
- Start PostgreSQL, verify it recovers
- Run
gitlab-ctl reconfigureagain (fixes pg_ident.conf) - Start all services
Two-Stage Restore Method (Recommended):
- Restore to local directory on Barman server first, then rsync to target
- More reliable than direct
--remote-ssh-commandmethod - Allows inspection of restored files before copying to target
- Rsync is faster for subsequent restores (incremental)
General Recommendations
- Always use
archive_mode = 'off'when using streaming replication (not WAL shipping) - Use
barman cronto automatically manage receive-wal processes - Test restores regularly - the process has multiple steps that can fail
- Document the exact recovery timestamp when marking recovery points
- Force WAL switch (
pg_switch_wal()) after making changes to ensure they're archived