@@ -30,17 +30,16 @@ In backup and recovery, there are two SLOs:
| SLO | Current level | Definition |
| ------------- |:-------------:| -----:|
| `DB-DR-TTR` | 8 hours | Maximum time to recovery from a full database backup in case of disaster|
| `DB-DR-RETENTION-MULTIREGIONAL` | 7 days | The number of days we keep backups for recovery purposes in [Multi-regional](https://cloud.google.com/storage/docs/storage-classes#standard) Storage class in GCS. |
| `DB-DR-RETENTION-COLDLINE` | From 8 to 90 days | The number of days we keep backups for recovery purposes in [Coldline](https://cloud.google.com/storage/docs/storage-classes#coldline) storage class in GCS. |
| `DB-DR-RETENTION` | 14 days | The number of days we keep backups for recovery purposes in [Multi-regional](https://cloud.google.com/storage/docs/storage-classes#standard) Storage class in GCS. |
The backup strategy is to take a daily snapshot of the full database
(basebackup) and store this in Google Cloud Storage. Additionally, we capture the
write-ahead log data in GCS to be able to perform point-in-time recovery
(PITR) using one of the basebackups. [Read more on Disaster Recovery](/handbook/engineering/gitlab-com/policies/disaster-recovery/)
The primary backup strategy is to take hourly incremental disk snapshots (block level) of all our database clusters (these are
[multi-regional standard persistent disk snapshots](https://docs.cloud.google.com/compute/docs/disks/snapshots)).
We also implement a secondary backup strategy with weekly full backups of database files (database level) and daily incremental
backups stored on separate multi-region Google Cloud Storage buckets. Additionally, we continuously archive all
write-ahead (transaction) log data in GCS to enable point-in-time recovery (PITR) using any backup strategy
(block-level or database-level). [Read more on Disaster Recovery](/handbook/engineering/gitlab-com/policies/disaster-recovery/)
For `DB-DR-TTR` we need to consider worst-case scenarios with the
latest backup being 24 hours old. Hence recovery time includes the time
it takes to perform PITR to recover from archive to a certain point in
Recovery time includes the time to perform PITR from the baseline backup plus transaction log archive recovery up to a certain point in
time (right before the disaster).
We are able to recover to any point in time within the last `DB-DR-RETENTION` days.