When a backend service in GitLab is throttling, users are unaware
Problem to solve
When GitLab's underlying services are throttling users, sometimes that doesn't get bubbled up. All end users see is a timeout, or a horribly slow cloning/fetch process.
Further details
- Gitaly was configured on a per-repo basis to hold off on retrying requests to the backend file service, end users had no idea they were being held up. Requests were only failing client side after a significant amount of time had passed.
- This also negatively impacts the apdex unnecessarily (this is debatable)
- When a user is being told by our backend service to wait up, we are technically responding
- This should not impact our apdex as we aren't failing their request, the client is requesting data too quickly
- https://dashboards.gitlab.net/d/MB0XQrSiz/gitaly-per-method-apdex-score?orgId=1&var-method=SSHUploadPack&var-satisfied=1&var-tolerated=1&var-period=1h&from=1541461993810&to=1541685292441
Proposal
- Bubble up to the client that they are being throttled
What does success look like, and how can we measure that?
Clients know why doing git operations are slow