2019-08-29: Partial degradation on file-06 due to possible abuse
Please note: if the incident relates to sensitive data, or is security related consider labeling this issue with security and mark it confidential.
Summary
A brief summary of what happened. Try to make it as executive-friendly as possible.
- Service(s) affected : ~"Service:Gitaly"
- Team attribution : Infra
- Minutes downtime or degradation : 70
Timeline
2019-08-29
- 12:04 UTC - We are paged about an increase in Gitaly errors on file-06
- 12:18 UTC - We pinpoint the project that's consuming lots CPU resources, we observe it's ~10 GBs in size
- 12:29 UTC - We manually cancel all cloning operations for that project
- 12:35 UTC - We temporarily block the user whose account is triggering a lot of pipelines
- 12:42 UTC - We cancel all running pipelines for the project
- 12:45 UTC - We are still observing new pipelines being created, we cancel them as they are created
- 13:10 UTC - We archive the project to stop new content from being pushed
- 13:15 UTC - No new tags are being pushed to the project
- 13:20 UTC - CPU load is going down on file-06 as we're cancelling all pipelines created for the project
Edited by Ahmad Sherif