Frame-wise proxying of write operations to Gitaly replicas
Problem to solve
When a success signal is sent via the Git protocol or an RPC for a write operation from a HA Gitaly cluster, we want the success signal to communicate that the write has succeeded in multiple locations so that a single node failure does not result in data loss.
Write operations are currently routed only to the primary node. This data also needs to be sent to the other nodes in the cluster.
Further details
Replaying the write operation from the primary to the secondary nodes is not ideal from a performance or reliability perspective.
It should be possible to proxy the write to all the Gitaly nodes. The simplest approach is frame-wise proxying, but if one node is slow, this will block all the nodes. In the future, stream-wise proxying would solve this.
Proposal
Implement frame-wise proxying of write operations to all Gitaly nodes as part of strong consistency.
Testing
Functional end-to-end coverage is provided by an existing replication test. And broad coverage by running the entire end-to-end test suite against environments using Praefect for storage (including Staging).
Performance end-to-end testing will be implemented as part of gitlab-org/quality/performance#231 (closed) (see https://gitlab.com/gitlab-org/quality/team-tasks/-/issues/451 for a more detailed plan of performance tests under failure conditions)