Use health probes for docker service startup
For services we:
- Start the service.
- Guess the port the service might be using (the first port from the images metadata)
- Start a new container running the helper binary's
healthcheckcommand, which either eventually fails or succeeds. The context deadline eventually cancels this container if the process never succeeds.
The issues with this approach are:
- Guessing a port is sometimes wrong. A Dockerfile can have multiple ports defined, but the service doesn't have to use the first one, the last one, or any of them in practice.
- A new container performing the health check is created for each service.
- We only support a TCP connection attempt to determine if a service is healthy. Just because a service is listening, doesn't mean that it is ready.
Perform health checks similar to Kubernetes' startup probes.
- TCP Probe
- HTTP Get Probe
- Exec Probe
The docker executor will always technically use the Exec Probe:
- For a config defined
execprobe, the Exec Probe will be configured based on the supplied requirements
- For a config defined
http_getprobe, the Exec Probe is configured to run the helper binary process within a container that has already been started (but in a loop to keep it alive).
Below is an example covering the settings available to each probe. A service can define multiple probes, they will be executed in order and if any fail after the defined retries/timeouts, the service will be considered to not have started correctly.
job: services: - name: service1 probes: - tcp: port: 8080 retries: 10 # optional initial_delay: 5s # optional period: 10s # optional timeout: 10s # optional - http_get: path: /health port: 8080 headers: X-Custom-Header: custom - exec: cmd: ["/bin/check"] - exec: cmd: ["/bin/check/another/thing"] - name: service2 probes: - http_get: path: /health port: 8080 headers: X-Custom-Header: custom initial_delay: 5s # optional period: 10s # optional timeout: 10s # optional - name: service3 probes: - exec: cmd: ["/bin/check"] retries: 10 # optional initial_delay: 5s # optional period: 10s # optional timeout: 10s # optional
This task can be broken into the following steps:
!2238 - Implement all probes alongside existing behaviour. Only support
NetworkPerBuildnetworking mode and support what we already support (TCP Probe only).
- Support more than TCP Probe
gitlab-ci.ymlto support all probes.
- Update runner to make use of the already implemented probes.
- Disable Docker's built in HEALTHCHECK support.
Links to related issues and merge requests / references
This came from a discussion in !1195 (comment 140928174)
This can also solve the following bugs: