2020-03-11: Gitaly error rate is high on file-45

Summary

More information will be added as we investigate the issue.

Timeline

All times UTC.

2020-03-11

  • 17:37 - Gitaly error rate abruptly rises. On 1 Gitaly node (file-45), CPU and memory usage rise rapidly. PagerDuty alert: https://gitlab.pagerduty.com/incidents/PRHLS53
  • 17:55 - @cindy and @nnelson identify the specific project receiving the excess traffic.
  • 18:04 - The extra workload ends abruptly. Resource usage returns to normal on file-45.

Screen_Shot_2020-03-11_at_12.17.46_PM

Resources

@cindy's Kibana graph showing the Gitaly gRPC calls correlated with the workload spike:

https://log.gprd.gitlab.net/goto/283f590166882b80982dc661aadcb560

(Not including a screenshot to protect the identity of the project.)

Edited Aug 03, 2020 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading