Gitaly Redundancy
Gitaly provides a number of great benefits today, like eliminating the need for NFS and allowing the use of local storage.
One major gap that remains though is the ability to be redundant, and continue operating without interruption in the event of the loss of a gitaly shard or it's storage.
Presently manual intervention would be required to restart the shard (this may be automated by a scheduler like Kubernetes), or in the event of the loss of storage to restore from a backup.
This is important for both our cloud native efforts, but also we are finding that a number of customers do not have an HA NFS solution. In the event they do not, standing up and maintaining a highly available NFS solution is non-trivial. AWS does provide EFS, however we have had challenges in utilizing it and it is only available to a subset of our base. We need to have a more comprehensive solution.
Continued from this thread: https://gitlab.slack.com/archives/C0NFPSFA8/p1513718679000189