2018-07-11 Errors on gitlab.com due to increased load on nfs-file-06

Summary

On July 11th at 4:00UTC we saw an increase of errors on the api fleet that affected approximately 20% of normal request traffic for 30 minutes. The root cause appears to be gitaly, a large number of git pack-objects on the file server consuming most of the resources. The increased load on disk caused IOWait to spike which slowed everything down on the machine. See graphs below:

Screen_Shot_2018-07-11_at_6.18.38_AM

Screen_Shot_2018-07-11_at_6.18.46_AM

Screen_Shot_2018-07-11_at_6.19.15_AM

Edited Jul 11, 2018 by John Jarvis
Assignee Loading
Time tracking Loading