Prune unused docker images on Ops ci-runner hosts

Task

Setup automatic pruning of unused old container images from the Ops environment's runner VM that handles chatops.

Host: runner-chatops-01-inf-ops.c.gitlab-ops.internal

Background

A PagerDuty alert noticed that the root filesystem on host runner-chatops-01-inf-ops had exceeded 90% full.

This turned out to be due to slow growth (less than 1% growth per day).

Where and how quickly is disk space being used?

The large majority of disk space was used by container image layers stored under /var/lib/docker/aufs/diff

msmiley@runner-chatops-01-inf-ops.c.gitlab-ops.internal:~$ sudo du -hxc / | sort -hr > /tmp/du-hxc.sorted.out

msmiley@runner-chatops-01-inf-ops.c.gitlab-ops.internal:~$ head /tmp/du-hxc.sorted.out
86G	total
86G	/
81G	/var
80G	/var/lib/docker/aufs/diff
80G	/var/lib/docker/aufs
80G	/var/lib/docker
...

The trend in free disk space shows a slow steady drop:

PromQL:

node_filesystem_free_bytes{fqdn="runner-chatops-01-inf-ops.c.gitlab-ops.internal", device="/dev/sda1"}

Screenshot_from_2021-01-25_18-10-23

What container images are taking up the disk space?

Before cleaning up the disk space as a manual task, I captured the list of images.

Most of the disk space seems to be used by images from 2 repos:

  • Images from repo registry.ops.gitlab.net/gitlab-com/chatops were typically 550 MB, and a fresh image seems to be pulled daily. 79 such images used a total of 43 GB.
  • Images from repo registry.ops.gitlab.net/gitlab-com/gl-infra/tamland were typically 3.8 GB. 7 such images used a total of 27 GB.

Counts:

msmiley@runner-chatops-01-inf-ops.c.gitlab-ops.internal:~$ sudo docker image ls | grep 'registry.ops.gitlab.net/gitlab-com/gl-infra/tamland' | wc -l
7

msmiley@runner-chatops-01-inf-ops.c.gitlab-ops.internal:~$ sudo docker image ls | grep 'registry.ops.gitlab.net/gitlab-com/chatops' | wc -l
79

Examples:

msmiley@runner-chatops-01-inf-ops.c.gitlab-ops.internal:~$ sudo docker image ls | grep 'registry.ops.gitlab.net/gitlab-com/gl-infra/tamland' | head -n 5
registry.ops.gitlab.net/gitlab-com/gl-infra/tamland                  latest                     61434c234ffb        2 weeks ago         3.88GB
registry.ops.gitlab.net/gitlab-com/gl-infra/tamland                  <none>                     1af0c85e7ea0        6 weeks ago         3.53GB
registry.ops.gitlab.net/gitlab-com/gl-infra/tamland                  <none>                     9ac364f52a0b        7 weeks ago         3.52GB
registry.ops.gitlab.net/gitlab-com/gl-infra/tamland                  <none>                     649d1423884e        2 months ago        3.84GB
registry.ops.gitlab.net/gitlab-com/gl-infra/tamland                  <none>                     3e12a7ad2679        2 months ago        3.84GB

msmiley@runner-chatops-01-inf-ops.c.gitlab-ops.internal:~$ sudo docker image ls | grep 'registry.ops.gitlab.net/gitlab-com/chatops' | head -n 5
registry.ops.gitlab.net/gitlab-com/chatops                           latest                     6d6aa6095e68        About an hour ago   541MB
registry.ops.gitlab.net/gitlab-com/chatops                           <none>                     968e0d60ec89        25 hours ago        541MB
registry.ops.gitlab.net/gitlab-com/chatops                           <none>                     e3fcfc3e0a28        2 days ago          541MB
registry.ops.gitlab.net/gitlab-com/chatops                           <none>                     1b10f13ee288        3 days ago          541MB
registry.ops.gitlab.net/gitlab-com/chatops                           <none>                     92661a8baf9d        3 days ago          541MB