Skip to content

Use health probes for docker service startup

Description

For services we:

  • Start the service.
  • Guess the port the service might be using (the first port from the images metadata)
  • Start a new container running the helper binary's healthcheck command, which either eventually fails or succeeds. The context deadline eventually cancels this container if the process never succeeds.

The issues with this approach are:

  • Guessing a port is sometimes wrong. A Dockerfile can have multiple ports defined, but the service doesn't have to use the first one, the last one, or any of them in practice.
  • A new container performing the health check is created for each service.
  • We only support a TCP connection attempt to determine if a service is healthy. Just because a service is listening, doesn't mean that it is ready.

Proposal

Perform health checks similar to Kubernetes' startup probes.

Support:

  • TCP Probe
  • HTTP Get Probe
  • Exec Probe

The docker executor will always technically use the Exec Probe:

  • For a config defined exec probe, the Exec Probe will be configured based on the supplied requirements
  • For a config defined tcp or http_get probe, the Exec Probe is configured to run the helper binary process within a container that has already been started (but in a loop to keep it alive).

Below is an example covering the settings available to each probe. A service can define multiple probes, they will be executed in order and if any fail after the defined retries/timeouts, the service will be considered to not have started correctly.

job:
  services:
  - name: service1
    probes:
    - tcp:
        port: 8080
      exec:
         command: ["/bin/check"]
      retries: 10 # optional
      initial_delay: 5 # optional
      period: 10 # optional
      timeout: 10 # optional
    - http_get:
        path: /health
        port: 8080
        headers:
          - 'X-Custom-Header: custom'
    - exec:
        command: ["/bin/check/another/thing"]

  - name: service2
    probes:
    - http_get:
        path: /health
        port: 8080
        headers:
          - 'X-Custom-Header: custom'
      initial_delay: 5 # optional
      period: 10 # optional
      timeout: 10 # optional

  - name: service3
    probes:
    - exec:
        command: ["/bin/check"]
      retries: 10 # optional
      initial_delay: 5 # optional
      period: 10 # optional
      timeout: 10 # optional

This task can be broken into the following steps:

  • !2238 (closed) - Implement all probes alongside existing behaviour. Only support NetworkPerBuild networking mode and support what we already support (TCP Probe only).
  • Support more than TCP Probe
    • Update gitlab-ci.yml to support all probes.
    • Update runner to make use of the already implemented probes.
  • Disable Docker's built in HEALTHCHECK support.

Links to related issues and merge requests / references

This came from a discussion in !1195 (comment 140928174)

This can also solve the following bugs:

Edited by Arran Walker