Skip to content

WIP: Add docker health probes

Arran Walker requested to merge ajwalker/docker-health-probes into main

What does this MR do?

Adds health probes, similar to Kubernetes' startup probes:

  • TCP Probe
  • HTTP Get Probe
  • Container Exec Probe

Each probe can be configured with the following settings:

  • Retries
  • Initial Delay
  • Period (duration between retries)
  • Timeout

These probes are only enabled with the NetworkPerBuild networking mode.

The existing health check implementation only supports checking if a network port is accessible. That port number used is a guess (the first port the Dockerfile hints to use).

This implementation mimics that behaviour, as additional probe support / allowing the user to explicitly specify the port requires CI server changes. However, this MR introduces all probes so that functionality can be more easily be introduced later.


A createHelperContainer(func()) is introduced intended to run things using ContainerExec, rather than creating a new container each time. To do this, the "helper container" runs a command called pause which is effectively just a loop. This keeps the container alive, whilst exec can be used multiple times.


Once docker links networking is removed, all older healthcheck code can be removed (both the command and logic in docker.go). The new command is called health-probe, and most things related to the new path have the keywork probe in them.

Why was this MR needed?

Health checks at the moment are fairly limited. This MR adds multiple probes to flesh out the interface and better determine the direction we need to take to fully utilize them. The health probes are very similar to those provided by Kubernetes to tackle the same problem.

In the future, gitlab.ci.yml will allow service probes to be defined. This is covered in Use health probes for docker service startup. The network format (how the probes are defined in gitlab.ci.yml and how they're sent to Runner) have not yet been agreed upon and this MR purposely doesn't implement that logic yet.

What's the best way to test this MR?

Given the scope is limited to existing behaviour (TCP check only), there's not a whole lot to test (other than ensuring the tests for each probe is sound).

What are the relevant issue numbers?

#3984 (closed)

Edited by Arran Walker

Merge request reports