Build Streaming Database Backup
This issue covers both having a solid backup system and a continuous restoration of these backups to be sure that they actually work following investigations on different available tools
TL;DR
On February the 1st we had an outage that was caused by an error and showed us that our backups are actually not working at all simply because we never too the time to restore any.
Given that restoring a backup is a complex thing to do at our current scale, and that it needs to be resourced, we need to get some value of performing this restoration.
On this constraint I propose to use Wal-E to perform a streaming backup to S3 (or Azure object storage), and to then use this same streamed backup to restore staging databases as they are needed. Effectively solving building tooling for execution and rolling back migrations in staging by creating disposable staging environments.
This will solve both having a near zero loss backup, and testing restoring it by reusing it as yet another development tool.
How can we do it?
-
Start using Wal-E to perform backups. -
Write a cookbook that we can apply to hosts that we want to backup -
Document how to use it in the runbooks
I'm open to change this plan, but the general idea is clear - we have streaming backup, and we use this streaming backup to restore the database on a daily basis, effectively removing the pain of performing a restoration as it will become a standard procedure.