Project 'gitlab-com/infrastructure' was moved to 'gitlab-com/gl-infra/production-engineering'. Please update any links and bookmarks that may still have the old path.
Investigate pgbarman for creating PostgreSQL backups
During the outage it was suggested by multiple people to use http://www.pgbarman.org/ for backup and recovery. I'm not familiar with it, but we should definitely look into it.
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related or that one is blocking others.
Learn more.
Pgbarman is a proven solution made by postgres experts. It provides a set of nice features:
incremental backups
save your stream of WALs thanks to pg's archive_command. This point is important if you want an standardized way of creating new replica database
(in your case maybe not useful) manages as much different pg clusters as needed.
I'm not experienced in LVM snapshots for database backup management and maybe it's a good solution. However if you have a cluster of 1-n primary-secondaries your disaster plan should be a failover to a secondary and that's your recovery solution. Backups should really arrive in last resort if you replication cluster is not working well.
Again that's a view from my personal experience managing a few postgres clusters with relatively big production database (~150Gb)
LVM snapshots sound like a good solution for quickly rolling back changes in a test environment, but in a production environment I don't think you should rely on a local disk snapshot being the only backup. Backups should be stored on another host at minimum, and preferably on another site.
Barman is a great tool, but if you're still using Azure you may want to take a look also at PGHoard (https://github.com/ohmu/pghoard/) which supports storing basebackups and WAL in Azure Cloud Storage among other things (disclaimer: I'm one of the PGHoard authors.)
I can relate - "rm -r" has been a close friend a number of times in my career...
Imagine your storage allowed you to go back to exactly a second before the rm -r was executed. Without additional tools or software. Without preparation. Without setting up anything.
Simply enter the exact time you want down to the second, hit a button, and have immediate access to all of your data the way it was a second before the incident.
If anyone is interested, ping me at eli@reduxio.com