Develop backup/restore strategy for Comet metadata and application databases
Before we can declare Comet open for business, we need at least a minimal disaster recovery ability. That means:
Backup
- We need regular backups of at least the Comet production metadata database, since once content is in Comet, that will be the only copy of record for the object metadata, and the only information telling us what all the objects we store in S3 actually are or how they relate to one another.
- As long as we're doing that, we might as well also do the Comet application database, which while not strictly critical does store useful history and state information (including state information we could use to determine what was in progress during an outage).
- As long as we're doing that -- and as long as we need to test the process anyway -- we should probably do the same for at least one other environment (sandbox? QA?)
Questions:
- how do we make the backups?
- what format should they be in?
- (personally I'd prefer plain SQL,
pg_dumpnative format is more compact, but it's annoying if you have to restore to a different version of Postgres than the one used to create the backup; and there might be some value to having a marginally human-readable file)
- (personally I'd prefer plain SQL,
- how often do we make them?
- how far back do we keep them?
- where do we put them?
Restore
- Once we have backups, we need to be able to restore them. At minimum, this means we should be able to start up a new empty production database, load the backup into it, and point Comet at it.
-
surfliner#1913 contemplates more sophisticated scenarios, e.g.
- restore only final, "at rest" data
- restore in-progress state of things like importers, exporters, etc., but only the parts that won't break with temp storage gone
- restore in-progress state of things, along with (
🪄 🎩 🐰 ) snapshot of associated temp storage, maybe? - other scenarios I haven't had time/space to think through