2018-06-14: Errors on GitLab resulting in slow page loads or 502s

At approximately 14:10UTC on 2018-06-14 for ~10minutes GitLab was severely degraded.

At this time the root cause appeared to be a large number of users sendings requests to a project on nfs-13. This caused a load spike on the server which then caused multiple nfs timeouts which in turn made the entire site unresponsive.

Screen_Shot_2018-06-14_at_5.04.16_PM

Screen_Shot_2018-06-14_at_5.04.09_PM

2018-06-14 15:30 UTC - A change was made to block the problematic requests in our load balancers.

incident doc (internal only): https://docs.google.com/document/d/1B7pyJTv6HKPs5bBWWAjIYUKCb1i_UU3MQwh4D4e71vk/edit#heading=h.26c8go6r6hs

Edited Jun 14, 2018 by Dave Smith
Assignee Loading
Time tracking Loading