new_note sidekiq queue is growing

Please note: if the incident relates to sensitive data, or is security related consider labeling this issue with security and mark it confidential.


Summary

A user generated many notes on a single commit which slowed down the new_note sidekiq queue and led to a delay of send out notifications.

image

new_note sidekiq jobs dropped: image (https://log.gitlab.net/goto/a768cc64e673b474eb59bce0dac36a57)

Service(s) affected : ~"Service:Sidekiq" Team attribution : Minutes downtime or degradation : 220

Timeline

2019-08-05

  • 11:16 UTC - 1410 note request to one commit added until 11:21 UTC
  • 11:16 UTC - new_note queue starts to grow
  • 12:24 UTC - Pagerduty Alert "Large amount of new_note sidekiq queued jobs: 6541"
  • 12:37 UTC - Incident started by EOC
  • 12:42 UTC - IMOC pinged by EOC
  • 12:49 UTC - EOC asking for help in backend Slack channel
  • 12:52 UTC - IMOC responded
  • 12:55 UTC - IMOC pinging Andrew
  • 13:00 UTC - Andrew joining Incident Room
  • 13:07 UTC - response from backend engineer in Slack
  • 13:17 UTC - status.io incident and tweet
  • 13:46 UTC - commit causing the issue identified
  • 13:51 UTC - status.io update (tweet)
  • 14:15 UTC - rebooting console which became unresponsive due to profiling
  • 14:18 UTC - manually purging the queue, removing the notes caused by the identified commit
  • 14:35 UTC - status.io update
  • 14:56 UTC - all pending jobs have been processed
  • 15:03 UTC - status.io incident resolved
Edited Aug 03, 2020 by 🤖 GitLab Bot 🤖
Assignee Loading
Time tracking Loading