Create PostgreSQL Snapshots before backing up database(s) (!106041) · Merge requests · GitLab.org / GitLab

Rutger Wessels requested to merge 366727-consistent-rake-backup-multiple-files into master Dec 05, 2022

What does this MR do and why?

Gitlab is splitting up the database. Instead of having 1 big monolith database, we want to run Gitlab on two distinct PostgreSQL databases. As a result of having two databases, it becomes more difficult to achieve consistent database backups from our rake tasks. This MR will create database snapshots before the database backup is created.

It works like this:

For each configured database, we run pg_export_snapshot() function and remember the snapshot_id created by PostgreSQL
For each database, pg_dump is called with --snapshot argument
After running pg_dump, we remove the snapshot by rolling back the transaction

This will not guarantee consistent backups but we are almost consistent: data modified during the period between the two pg_export_snapshot() calls could lead to inconsistencies. In many installations, the number of updates and inserts is low enough for this to be not an issue. The snapshots are taken one after the other and it is fast: a simple timing shows that we need 0.01 seconds for taking both snapshots.

Related to #366727 (closed)

Alternatives

pg_dumpall can't be used because it does not guarantee consistency across databases
pg_basebackup results in consistent (cluster-level) backups but has no option to filter out certain databases

Notable changes

`Backup::Manager`

We used to have distinct backup tasks for main and ci database. These are now merged into a resurrected db task. The reason for this change: the Backup::Manager should not be responsible for setting the PostgreSQL snapshots.

`Backup::Database`

The dump method was not under test. So this MR will add tests for this method
The dump method will now first set the PostgreSQL snapshots and then, for each database, run a pg_dump
The restore method had to be changed since the main and ci tasks has been removed. The calls to gitlab:db:drop_tables rake task have been replaced by gitlab:db:drop_tables:main and gitlab:db:drop_tables:ci, leading to a more complex rspec setup, because these rake tasks are dynamically generated.
The code complexity of dump method was too high. So the actual dumping and gzipping logic has been moved to a separate class: Backup::Dump::Postgres

How to set up and validate locally

Test the snapshotting of the database

Modify Backup::Database.dump method: add a sleep(10) after the snapshots have been set
Run SKIP=tar bundle exec rake gitlab:backup:create
During the sleep(10), insert some data in a random table: gdk psql
- INSERT INTO shards (name) VALUES('4fded1464736e77865df232cbcb4cd19');
When the backup is done: gunzip -c tmp/backups/db/database.sql.gz | grep 4fded1464736e77865df232cbcb4cd19
The text does not show up, so the snapshot works correctly: data inserted during backup is not included in the database dump

Test the backup and restore procedure

This procedure was tested with

Multi database setup
Single database setup

Test procedure (use 'main' database for single database setup):

Insert some data in ci db. This database is empty on development, so add some data to it:
- gdk psql -d gitlabhq_development_ci
- INSERT INTO shards (name) VALUES('This should be restored');
Check number of projects in main db:
- gdk psql
- SELECT COUNT(*) FROM projects; (for example 8)
Create backup SKIP=tar bundle exec rake gitlab:backup:create
insert some data in both databases, these rows should be gone after restore:
- gdk psql
- INSERT INTO shards (name) VALUES(current_database());
- \connect gitlabhq_development_ci
- INSERT INTO shards (name) VALUES(current_database());
Run restore bundle exec rake gitlab:backup:db:restore
Now verify:
- gdk psql
- Main database has just one record (name=default) in this table: SELECT * FROM shards;
- Main database still has correct number of projects: select count(*) from projects;
- \connect gitlabhq_development_ci
- CI DB still has the experiment record that was added before creating the backup: select * from shards yields 1 record

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Dec 13, 2022 by Rutger Wessels

Create PostgreSQL Snapshots before backing up database(s)