[DEVOPS INCIDENT] Low disk space

Meta

Date of appearance: <2018-02-20 Wed> Date of resolution: <2018-02-20 Wed>

Total downtime caused: ~18mins

Participating members: Lasse, Yuki, Fabian

Symptoms

  • Server is running low on disk space
  • Server is running high on memory
  • No low disk space alert happened
  • We couldn't prune old docker images for some reason

Possible Causes

  • There were many stale docker images not cleaned up

Attempts

  • @fneu removed a container to free up minimal space to be able to do stuff (worked, 800MB)
  • @yuki_is_bored tried pruning all images which for some reason didn't do anything - somehow the docker daemon wasn't reacting?
  • At some point storage went down fast (nobody knows why :/) and we had to do an emergency reboot. After the reboot pruning worked fine and we're down to 50% disk space use.

Stuff need to be done

  • Fix low space alerts
  • Add auto pruning cronjob with error alerts
  • Possibly scheduled reboots? CC @yuki_is_bored
Edited by Lasse Schuirmann