saturation on a few Gitaly nodes causes a slow down across the entire web fleet
Summary
More information will be added as we investigate the issue.
Sidekiq pull mirror jobs:
are causing saturation on 3-5 gitaly nodes:
which result in rails requests (that involve those nodes) to take longer than usual:
This in turn causes the unicorn queues to go up:
and the responsiveness of the entire web fleet is affected as a result:
src: https://dashboards.gitlab.net/d/web-main/web-overview?orgId=1&%3ForgId=1&from=now-1h&to=now
Timeline
All times UTC.
YYYY-MM-DD
we only have metrics from the last 24h, it has been happening for at least as long
2019-12-17
- 10:24 - EOC is paged
~S3
Edited by Michal Wasilewski