Skip to content

Add grpc pushbhack header when Gitaly exceeding concurrency limit

For #5023 (closed)

In Allow Gitaly to push back on traffic surges (&7891 - closed), I'm working on letting Gitaly pushback traffic when a certain threshold is reached. When it does, it starts to return a LimitError and ResourceExhausted status. When internal clients receive such errors, they instantly return 429 errors back to users. While this protection is a good thing for Gitaly, the satisfaction of users may drop. Furthermore, if users retry instantly, they add more on Gitaly server. Eventually, it's likely Gitaly returns the same error.

Git pull operations typically take minutes to finish. It makes sense to add automatic retry and a certain amount of delays to internal clients. As a result, users will be put on-hold until either Gitaly has some room to breather, or they exceed the retry attempts. This mechanism has two main purposes:

  • Reduce user-facing error rate.
  • Create a backpressure back. This backpressure pushes clients away, back them off for a while and prevents user-initiated manual retries.

gRPC has a built-in mechanism for push-back: https://github.com/grpc/proposal/blob/master/A6-client-retries.md#pushback. This mechanism sets a special header grpc-retry-pushback-ms. It's a strong hint that the client must back off for a while before retrying again. Typically, clients with official gRPC libraries honor this header. One good thing of this mechanism is that, gRPC server has full power to determine this value. Clients supply grpc-previous-rpc-attempts header by default. Server can push back exponentially, or reject requests permanently.

This is a typical flow when setting this header:

sequenceDiagram
  Users ->> InternalClients: Call
  InternalClients ->> GitalyServer : PostUploadPack
  GitalyServer ->> InternalClients: ResourceExhausted [grpc-retry-pushback-ms=1000]
  Note over InternalClients: Sleep
  InternalClients ->> GitalyServer : PostUploadPack
  GitalyServer ->> InternalClients: ResourceExhausted [grpc-retry-pushback-ms=1500]
  Note over InternalClients: Sleep
  InternalClients ->> GitalyServer : PostUploadPack
  GitalyServer ->> InternalClients: Data
  InternalClients ->> Users: Data

If the server rejects the request permanently:

sequenceDiagram
  Users ->> InternalClients: Call
  InternalClients ->> GitalyServer : PostUploadPack
  GitalyServer ->> InternalClients: ResourceExhausted [grpc-retry-pushback-ms=1000]
  Note over InternalClients: Sleep
  InternalClients ->> GitalyServer : PostUploadPack
  GitalyServer ->> InternalClients: ResourceExhausted [grpc-retry-pushback-ms=1700]
  Note over InternalClients: Sleep
  InternalClients ->> GitalyServer : PostUploadPack
  GitalyServer ->> InternalClients: ResourceExhausted [grpc-retry-pushback-ms=3500]
  Note over InternalClients: Sleep
  InternalClients ->> GitalyServer : PostUploadPack
  GitalyServer ->> InternalClients: ResourceExhausted
  InternalClients ->> Users: 429 status

Merge request reports