Production Failover Dry-run with Maintenance Window
For discussion.
At the cost of scheduled downtime, we could mitigate risk on failover day by performing a half-failover.
We could do this by:
- Prevent all updates on GitLab.com
- Proceed through the failover checklist as far as the Postgres failover, but no further
It will help us to prepare better for the day, and recognise blind spots in our current process which our staging environment is not highlighting.
It will also give us an idea of timing, how long sidekiq queues will take to run down etc.
Obviously, the major downside to this is the fact that we need to prevent all updates on GitLab.com, possibly for an hour. If we were to chose to do this, it would make sense to also do it on a Saturday.
WDYT?
Edited by Andrew Newdigate