Add grpc pushbhack header when Gitaly exceeding concurrency limit
For #5023 (closed)
In Allow Gitaly to push back on traffic surges (&7891 - closed), I'm working on letting Gitaly pushback traffic when a certain threshold is reached. When it does, it starts to return a LimitError and ResourceExhausted status. When internal clients receive such errors, they instantly return 429 errors back to users. While this protection is a good thing for Gitaly, the satisfaction of users may drop. Furthermore, if users retry instantly, they add more on Gitaly server. Eventually, it's likely Gitaly returns the same error.
Git pull operations typically take minutes to finish. It makes sense to add automatic retry and a certain amount of delays to internal clients. As a result, users will be put on-hold until either Gitaly has some room to breather, or they exceed the retry attempts. This mechanism has two main purposes:
- Reduce user-facing error rate.
- Create a backpressure back. This backpressure pushes clients away, back them off for a while and prevents user-initiated manual retries.
gRPC has a built-in mechanism for push-back: https://github.com/grpc/proposal/blob/master/A6-client-retries.md#pushback. This mechanism sets a special header grpc-retry-pushback-ms
. It's a strong hint that the client must back off for a while before retrying again. Typically, clients with official gRPC libraries honor this header. One good thing of this mechanism is that, gRPC server has full power to determine this value. Clients supply grpc-previous-rpc-attempts
header by default. Server can push back exponentially, or reject requests permanently.
This is a typical flow when setting this header:
sequenceDiagram
Users ->> InternalClients: Call
InternalClients ->> GitalyServer : PostUploadPack
GitalyServer ->> InternalClients: ResourceExhausted [grpc-retry-pushback-ms=1000]
Note over InternalClients: Sleep
InternalClients ->> GitalyServer : PostUploadPack
GitalyServer ->> InternalClients: ResourceExhausted [grpc-retry-pushback-ms=1500]
Note over InternalClients: Sleep
InternalClients ->> GitalyServer : PostUploadPack
GitalyServer ->> InternalClients: Data
InternalClients ->> Users: Data
If the server rejects the request permanently:
sequenceDiagram
Users ->> InternalClients: Call
InternalClients ->> GitalyServer : PostUploadPack
GitalyServer ->> InternalClients: ResourceExhausted [grpc-retry-pushback-ms=1000]
Note over InternalClients: Sleep
InternalClients ->> GitalyServer : PostUploadPack
GitalyServer ->> InternalClients: ResourceExhausted [grpc-retry-pushback-ms=1700]
Note over InternalClients: Sleep
InternalClients ->> GitalyServer : PostUploadPack
GitalyServer ->> InternalClients: ResourceExhausted [grpc-retry-pushback-ms=3500]
Note over InternalClients: Sleep
InternalClients ->> GitalyServer : PostUploadPack
GitalyServer ->> InternalClients: ResourceExhausted
InternalClients ->> Users: 429 status