Rapid Action: Aug 8 incident
@timzallmann is DRI for this issue
From @marin
From the items that need to be done together with Development teams:
- Created to create a better way of handling tags. This issue contributed to the incident we had today
- The race condition that caused the second incident needs to be resolved in https://gitlab.com/gitlab-org/gitlab-ce/issues/65803
- Finally, we need to also have https://gitlab.com/gitlab-org/gitlab-ce/issues/51096 ASAP in case incidents like this appear again, we have no tools in our hands to handle situations where the queues are going out of control (edited)
Additional Rapid Action Items to Track:
- Only run ProjectCacheWorker once per push: https://gitlab.com/gitlab-org/gitlab-ce/issues/52046
- De-duplicate pipeline processing jobs: https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/31370
- Limit PostReceive: https://gitlab.com/gitlab-org/gitlab-ce/issues/65804
- Load Balancing Review: https://gitlab.com/gitlab-org/gitlab-ce/issues/66074
Link to Slack Channel - https://gitlab.slack.com/messages/CM8M8LP6J
Edited by Tim Zallmann