Create readiness probes for services in kubernetes executor
Overview
The user can specify services for their jobs. These services can be databases, webservers, anything they want. This service is needed to run the job of the user for example tests. The service might take a while to boot up and be ready and can cause a flaky scenario where the job fails because the service hasn't started yet. We need to provide some mechanism so that the user script doesn't start until all the services are ready.
Proposal - Update 2023-05-25
For the Docker executor, in !4079 (merged), we introduced a service variable called HEALTHCHECK_TCP_PORT
that allows checking of an explicit service port.
The Docker executor has supported service health checks for some time, but the port was determined via inspecting the image's exposed ports. However, the service variable was introduced because the port we automatically detected wasn't always correct.
For Kubernetes, we're unable to determine ports automatically (there's no API for inspecting an image's EXPOSE
metadata). But we can however support an explicit port check with HEALTHCHECK_TCP_PORT
in the same way.
Proposed implementation - Update 2023-12-12
In the case of the ~"executor::docker," the service health check is done by a container per service, which assigns the relevant environment variables. The command executed on this container is detailed here.
gitlab-runner-helper health-check
The build container is generated after the successful completion of all the health check containers
.
However, this approach is not applicable for the executorkubernetes because the services are created as containers within the Pod, and once the Pod is running, all the containers start simultaneously.
We cannot leverage k8s init-container
either has the service containers won't start until the init container is done.
The solution could be:
- To update existing scripts so that the
wait
is done before the step_script itself is executed. - To run an
exec
command remotely on thehelper
container before the step_script. The executed command could be like follow
sh -c gitlab-runner-helper health-check --address address1 --port port1 && gitlab-runner-helper health-check --address address2 --port port2 ...
Current workaround
Comment here
Same issue here with docker:20.10.16-dind as service in a kubernetes based runner using the kubernetes executor.
The until docker info; do sleep 1; done
fixes it but was not necessary when using docker:19.03.0-dind...