Longer than usual response times from rails
Please note: if the incident relates to sensitive data, or is security related consider labeling this issue with security and mark it confidential.
Summary
A number of things happened starting from 8:20 UTC which generated a higher than usual load on sidekiq and gitaly fleets. This lead to increase in queues and timings which resulted in rails responding slower than usual
Service(s) affected : rails Team attribution : Minutes downtime or degradation : 1h45m (08:30UTC-10:15UTC)
Timeline
2019-06-05
- 08:20 UTC - increase in sidekiq method call count and cpu time, git timings on sidekiq nodes are many times higher than usual, increased number of SQL queries, max SQL timings are in the order of minutes, slight increase in pull_mirror queues, increase in memory usage on sidekiq asap nodes
- 08:32 UTC - increase in response times from rails
- 08:40 UTC - spike in the number of github import jobs
- 08:54 UTC - on-call gets paged about rails latency, significant increase in cpu load on besteffort sidekiq nodes
- 09:05 UTC - significant increase in pull_mirror queues
- 09:20 UTC - decrease in method call count and cpu time, decrease in memory usage on sidekiq asap nodes, spike in the number of github import jobs (at peak 1.2k)
- 09:49 UTC - on-call gets paged about increase in pull_mirror queues
- 10:00 UTC - pull_mirror alert clears
- 10:07 UTC - rails response times alert clears
- 10:15 UTC - all queues are back to normal
Edited by Michal Wasilewski