[meta] Provide a method to backup and restore on k8s
GitLab on Kubernetes Backup
Now that we will have an official GA installation method for GitLab on Kubernetes with our Helm Chart, we need to address other critical areas for operating GitLab this way at an enterprise scale. One of the important areas is ensuring we have a well documented and solid backup solution for our customers when they deploy this way.
There are a few areas to think about when backing up GitLab:
- Database
- Uploads (attachments)
- Repositories
- LFS
- Builds
- Artifacts
- Container Images (Registry)
- Pages
We also provide a few options which may change how a customer wants the backup to behave:
- They are bringing their own database or Redis
- If object storage is being utilized
Backup Solutions
We have a couple of options for backup:
- A k8s cron job could be created to automate the backup process
- A k8s operator/agent could be created, to help manage the backup process with additional intelligence (node selection, different components, etc.)
- Where to put the data: another mounted volume, object storage, etc.
Restore
Similar to backing up, we will need to document and test the restore procedure. It does seem like this will need to be manual, due to needing to know which version to restore and whether it is the exact same version. We can iterate on options to improve once we have a good backup option in place.
Note we have around found problems with restoring using the existing helm chart and permissions on the mounts: https://gitlab.com/charts/charts.gitlab.io/issues/96
Issue to track support for restore: https://gitlab.com/charts/helm.gitlab.io/issues/340
-
Ordered list of restore items to tackle
-
-
Database
-
-
-
Repositories
-
-
-
Container Images (Registry)
-
-
-
Uploads (attachments)
-
-
-
Artifacts
-
-
-
Traces
-
-
-
LFS
-
-
-
List in order of the backup items:
https://gitlab.com/charts/helm.gitlab.io/issues/368
Validation of backups -- How can we automate the checking of the backup to ensure it is operational?
- How can we alert if the backups fail, or the backups are corrupted?