Skip to content

Create PostgreSQL Snapshots before backing up database(s)

What does this MR do and why?

Gitlab is splitting up the database. Instead of having 1 big monolith database, we want to run Gitlab on two distinct PostgreSQL databases. As a result of having two databases, it becomes more difficult to achieve consistent database backups from our rake tasks. This MR will create database snapshots before the database backup is created.

It works like this:

  • For each configured database, we run pg_export_snapshot() function and remember the snapshot_id created by PostgreSQL
  • For each database, pg_dump is called with --snapshot argument
  • After running pg_dump, we remove the snapshot by rolling back the transaction

This will not guarantee consistent backups but we are almost consistent: data modified during the period between the two pg_export_snapshot() calls could lead to inconsistencies. In many installations, the number of updates and inserts is low enough for this to be not an issue. The snapshots are taken one after the other and it is fast: a simple timing shows that we need 0.01 seconds for taking both snapshots.

Related to #366727 (closed)

Alternatives

  • pg_dumpall can't be used because it does not guarantee consistency across databases
  • pg_basebackup results in consistent (cluster-level) backups but has no option to filter out certain databases

Notable changes

Backup::Manager

We used to have distinct backup tasks for main and ci database. These are now merged into a resurrected db task. The reason for this change: the Backup::Manager should not be responsible for setting the PostgreSQL snapshots.

Backup::Database

  • The dump method was not under test. So this MR will add tests for this method
  • The dump method will now first set the PostgreSQL snapshots and then, for each database, run a pg_dump
  • The restore method had to be changed since the main and ci tasks has been removed. The calls to gitlab:db:drop_tables rake task have been replaced by gitlab:db:drop_tables:main and gitlab:db:drop_tables:ci, leading to a more complex rspec setup, because these rake tasks are dynamically generated.
  • The code complexity of dump method was too high. So the actual dumping and gzipping logic has been moved to a separate class: Backup::Dump::Postgres

How to set up and validate locally

Test the snapshotting of the database

  • Modify Backup::Database.dump method: add a sleep(10) after the snapshots have been set
  • Run SKIP=tar bundle exec rake gitlab:backup:create
  • During the sleep(10), insert some data in a random table: gdk psql
    • INSERT INTO shards (name) VALUES('4fded1464736e77865df232cbcb4cd19');
  • When the backup is done: gunzip -c tmp/backups/db/database.sql.gz | grep 4fded1464736e77865df232cbcb4cd19
  • The text does not show up, so the snapshot works correctly: data inserted during backup is not included in the database dump

Test the backup and restore procedure

This procedure was tested with

  • Multi database setup
  • Single database setup

Test procedure (use 'main' database for single database setup):

  • Insert some data in ci db. This database is empty on development, so add some data to it:
    • gdk psql -d gitlabhq_development_ci
    • INSERT INTO shards (name) VALUES('This should be restored');
  • Check number of projects in main db:
    • gdk psql
    • SELECT COUNT(*) FROM projects; (for example 8)
  • Create backup SKIP=tar bundle exec rake gitlab:backup:create
  • insert some data in both databases, these rows should be gone after restore:
    • gdk psql
    • INSERT INTO shards (name) VALUES(current_database());
    • \connect gitlabhq_development_ci
    • INSERT INTO shards (name) VALUES(current_database());
  • Run restore bundle exec rake gitlab:backup:db:restore
  • Now verify:
    • gdk psql
    • Main database has just one record (name=default) in this table: SELECT * FROM shards;
    • Main database still has correct number of projects: select count(*) from projects;
    • \connect gitlabhq_development_ci
    • CI DB still has the experiment record that was added before creating the backup: select * from shards yields 1 record

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Rutger Wessels

Merge request reports