Lazy failover (!1450) · Merge requests · GitLab.org / Gitaly

This is a WIP for discussion purposes to discuss our failover strategy. The approach in this MR is to:

Detect when we cannot reach a server
Lock the coordinator so no new requests can go through
Update the datastore by finding all repositories with the failed connection/storage as its primary by promoting one of its replicas to the primary
Retry getting a new backend connection

Lazy failover