Gitaly Cluster read distribution (General Availability)
Problem to solve
Repositories with many contributors combined with highly parallelized CI pipelines, and highly variable project access patterns can cause resource exhaustion, impacting availability. Mitigating these performance problems by sharding generally results significant over provisioning of CPU and memory to handle peak load.
Gitaly Cluster allows Git repositories to be replicated on multiple warm Gitaly nodes, and can be used to scale performance by distributing read operations between up-to-date replicas. When using Gitaly Cluster, from 13.3, read operations will automatically be distributed between up-to-date replicas. In comparison to three shards with one Gitaly node, a single Gitaly Cluster with three Gitaly nodes can share the available CPU and memory resources across all repositories.
Further details
There is a potential issue with performance gitlab-org/quality/performance#231 (closed) that needs to be handled before it can be marked as highly available.
The testing on gitlab.com is also blocked because of the !2340 (merged) as almost all replication events can't be properly processed, so there is no up-to-date secondaries that could be used for reads distribution. More info in the #2903 (closed).
It also requires visual presentation in Grafana dashboard to track it.
Proposal
- Improve query performance and observe in production.
- Enable read distribution
gitaly_distributed_reads
feature flag by default.