customers.gitlab.com - out of disk space. We need monitoring/alerting for this.

Summary

Note: For database incidents please use the database incident template

Timeline of events

  • 2019-05-15 13:14 UTC: A customer reported receiving 500s after a purchase, reported by @LisavandeKooij in Slack
  • Shortly after, I was attempting to extend a group trial on the Rails console and hit up against very clear disk space errors
  • Noted that the /boot drive had approx. 14GB of kernel images.
  • Issued uname -a to see which version of kernel we were running (4.4.0-112-generic)
  • manually removed all *4.4.0-7* images with rm -i to clear up disk space
  • removed the rest of the images with apt-get autoremove
  • 15:27 UTC noted in Slack that I had done this
  • YYYY-MM-DD XX:YY UTC: action X taken

Monitoring

Logs

Edited by Alberto Ramos