[Design Document] Disaster Recovery
Now that we've switched over to the GCP, let's talk about the future of Geo at GitLab.
In a call with @sytses
(https://www.youtube.com/watch?v=JQ3fUGs151I), we discussed that we think that we still should be running Geo on another region in GitLab for multiple reasons:
- Geo has identified and fixed lots of bugs/issues within GitLab itself (e.g. missing uploads, failures during renames, database inconsistencies, etc.)
- Running Geo at GitLab.com scale has revealed lots of bugs and performance issues with Geo itself
- We were able to recover some lost data on GitLab.com
To save costs, we can probably scale down the Geo secondary fleet size significantly and use slower disks. We may even want to add a delay to the database replication to ensure Geo can function as a disaster recovery solution in case someone drops the database etc.
gitlab-com/www-gitlab-com!15001 (merged)
Output of issue should be design doc/plan on how to setup Geo for DR in another region - will be part of our 2018 Q4 OKRs.
Edited by Devin Sylva