Document active/passive high availability use-case for Geo/Disaster Recovery

This topic came up at the summit in one of the UGCs. We have great high availability (and scaling) options and this will get even better as we add more of this to Omnibus. There's one use-case that isn't covered by our active/active HA approach, though - active/passive.

Why do we need another HA option?

Our active/active HA approach requires lots of servers to achieve true high availability across the stack. It also adds complexity in management. This opens us up to easy scaling alongside HA. However, many customers have modest scaling needs (if any) and still want high availability. We know from experience that DRBD can be hard to manage and we don't really want to support that.

Use Geo/Disaster Recovery technology!

We are building disaster recovery into GitLab via Geo. All data is replicated and in the future we'll provide an easier mechanism to promote a secondary to primary and move traffic over. While Geo is built for WAN replication why not use it for local replication, too? I imagine that it's mostly a matter of documentation once we have the GA release.

This would not only complete our HA options but it fits well with our placement of Omnibus HA in EE Premium.

cc/ @stanhu @jacobvosmaer-gitlab @nick.thomas I don't recall exactly who I was talking with, but you're all probably interested in this.

Edited Nov 07, 2017 by Drew Blessing