Geo: disaster recovery (single-secondary)
Customer want a disaster recovery solution to prevent their organization being severely impacted by a data center outage or some other major failure. We also want to be able to use such a solution for GitLab.com.
A key component of disaster recovery is making sure that data is replicated and current in another location that is accessible. GitLab Geo provides this foundation.
To offer a comprehensive disaster recover solution, everything needs to replicated and accessible:
- [x] git
- [x] git LFS
- [x] object storage gitlab-org/gitlab-ee#3944 (replicated externally)
- [x] local (disk, NFS etc)
- [x] wiki
- [x] database (issues, merge requests, snippets etc)
- [x] attachments (images on issues and merge requests)
- [x] object storage gitlab-org/gitlab-ee#3944 (replicated externally) - BLOCKED BY gitlab-org/gitlab-ee#4163
- [x] local (disk, NFS etc)
- [x] CI logs and artifacts
- [x] object storage gitlab-org/gitlab-ee#3944 (replicated externally)
- [x] local (disk, NFS etc) gitlab-org/gitlab-ee#2388
- [ ] container registry
- [x] object storage https://docs.gitlab.com/ce/administration/container_registry.html#container-registry-storage-driver
- [ ] local
The disaster recovery process should be simple and well documented, so that in the event of a disaster, the recovery process is quick and does not have the potential to make the situation worse.
- [x] Omnibus (one primary instance, one secondary instance): fail over to secondary, fail back to original state
epic