2023-06-22: Increase in failing grpc requests due to "ResourceExhausted" and "spawn token timeout"
Customer Impact
At least one customer has notified us of this issue occurring on their projects. From looking at our logs and dashboards, the error occurs intermittently for other customers as well in a variety of situations.
Current Status
This error appears to be caused by defensive mechanisms on the Gitaly server to avoid getting overloaded with requests during periods of bursty traffic. These periods are usually very short (a few minutes at most) and the server recovers afterward - retrying the failed request should generally succeed.
It's worth noting that the customer's project is not particularly noisy but is still getting throttled, resulting in failed pipelines, which is a poor user experience.
📚 References and helpful links
Recent Events (available internally only):
- Feature Flag Log - Chatops to toggle Feature Flags Documentation
- Infrastructure Configurations
- GCP Events (e.g. host failure)
Deployment Guidance
- Deployments Log | Gitlab.com Latest Updates
- Reach out to Release Managers for S1/S2 incidents to discuss Rollbacks, Hot Patching or speeding up deployments. | Rollback Runbook | Hot Patch Runbook
Use the following links to create related issues to this incident if additional work needs to be completed after it is resolved:
- Corrective action ❙ Infradev
- Incident Review ❙ Infra investigation followup
- Confidential Support contact ❙ QA investigation
Note: In some cases we need to redact information from public view. We only do this in a limited number of documented cases. This might include the summary, timeline or any other bits of information, laid out in out handbook page. Any of this confidential data will be in a linked issue, only visible internally. By default, all information we can share, will be public, in accordance to our transparency value.