Skip to content

change rate limit err to ResourceExhausted

Steve Xuereb requested to merge fix/concurrency-rate-limiting-grpc-code into master

What

Change the returned gRPC code from Unavailable to ResourceExhausted when the user reaches concurrency limits.

Why

In gitlab-com/gl-infra/production#8056 (closed) and gitlab-com/gl-infra/production#8071 (closed) we saw a single user paging the on-call because of the high error rate. When we looked at the error rate it was because they were reaching concurrency limits. Rate limiting a user is not an error for us but normal behavior just like a 429 HTTP status code.

In https://gitlab.com/gitlab-com/gl-infra/reliability/-/issues/16844#note_1175622086 we looked into the best gRPC code to return, and ResourceExhausted was the best one where we could differeiencate between a real server error and a user error. This goes against the grpc mapping but also follows what the envoy proxy does.

Merge request reports