Corrective action: Don't mark ratelimited requests as server errors
Summary
In production#8056 (closed) we saw our ServiceGitaly error ratio being affected when users reached concurrency limits on the repository. This is because we return the grpc_code
Unavailable
.
Related Incident(s)
Originating issue(s):
Desired Outcome/Acceptance Criteria
-
Update rate metric to exclude ResourceExhausted
.👉 gitlab-com/runbooks!5170 (merged) -
Update ConcurrencyLimiter to return ResourceExhausted
instead ofUnavailable
👉 gitlab-org/gitaly!5084 (merged) -
Ignore ResourceExhausted
status code👉 gitlab-com/runbooks!5185 (merged) -
Revert gitlab-com/runbooks!5170 (merged) OR completely remove ignoring Unavailable
👉 gitlab-com/runbooks!5202 (merged)
Associated Services
Corrective Action Issue Checklist
-
Link the incident(s) this corrective action arose out of -
Give context for what problem this corrective action is trying to prevent from re-occurring -
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4') -
Assign a priority (this will default to 'Reliability::P4')
Edited by Steve Xuereb