Job queue pipeline_processing:pipeline_process is growing
Please note: if the incident relates to sensitive data, or is security related consider labeling this issue with security and mark it confidential.
Summary
Job queue is growing for pipeline_processing:pipeline_process
.
Update
The root causes were 2 pipelines causing a massive amount of sidekiq jobs, sidekiq pipeline nodes maxing out their CPU, pipeline_processing jobs causing many SQL calls and pgbouncer pool becoming saturated.
Service(s) affected : ~"Service:Sidekiq"
Team attribution :
Minutes downtime or degradation : 240
Timeline
2019-07-31
- 13:51 UTC - Sidekiq single_node_cpu alert
- 14:26 UTC - support reporting customer issues with slow pipelines in production channel
- 14:41 UTC - incident started
- 14:45 UTC - status.io post
- 15:04 UTC - pgbouncer connection_pool saturation alert
- 15:15 UTC - DBRE (ongres) paged for support
- 15:47 UTC - status.io update (Queue decreasing)
- 16:13 UTC - status.io update (Queue back to normal)
- 16:40 UTC - status.io resolved
Edited by 🤖 GitLab Bot 🤖