Discussion: how to determine if a gitaly node is down
Based on a discussion with professional services, MVP for praefect failover cannot be a manual process where an SRE has to change a config file and restart praefect. The first decision we need to make then, is, "what does it mean that a gitaly node is down?"
Proposal:
A gitaly node is down if it does not respond to healthcheck for a configurable # of times.
The reason for this approach is that it matches what some customers are already doing in their DNS load balancer approach where they have a master and slave gitaly node. When the master node doesn't respond to a healthcheck for a configured # of times, the DNS load balancer will automatically switch over to the slave.
Edited by John Cai