Create readiness probes for services in kubernetes executor

Overview

The user can specify services for their jobs. These services can be databases, webservers, anything they want. This service is needed to run the job of the user for example tests. The service might take a while to boot up and be ready and can cause a flaky scenario where the job fails because the service hasn't started yet. We need to provide some mechanism so that the user script doesn't start until all the services are ready.

Proposal - Update 2023-05-25

For the Docker executor, in !4079 (merged), we introduced a service variable called HEALTHCHECK_TCP_PORT that allows checking of an explicit service port.

The Docker executor has supported service health checks for some time, but the port was determined via inspecting the image's exposed ports. However, the service variable was introduced because the port we automatically detected wasn't always correct.

For Kubernetes, we're unable to determine ports automatically (there's no API for inspecting an image's EXPOSE metadata). But we can however support an explicit port check with HEALTHCHECK_TCP_PORT in the same way.

Proposed implementation - Update 2023-12-12

In the case of the ~"executor::docker," the service health check is done by a container per service, which assigns the relevant environment variables. The command executed on this container is detailed here.

gitlab-runner-helper health-check

The build container is generated after the successful completion of all the health check containers.

However, this approach is not applicable for the executorkubernetes because the services are created as containers within the Pod, and once the Pod is running, all the containers start simultaneously.

We cannot leverage k8s init-container either has the service containers won't start until the init container is done.

The solution could be:

To update existing scripts so that the wait is done before the step_script itself is executed.
To run an exec command remotely on the helper container before the step_script. The executed command could be like follow

sh -c gitlab-runner-helper health-check --address address1 --port port1 && gitlab-runner-helper health-check --address address2 --port port2 ...

Current workaround

Comment here

Same issue here with docker:20.10.16-dind as service in a kubernetes based runner using the kubernetes executor. The until docker info; do sleep 1; done fixes it but was not necessary when using docker:19.03.0-dind...

Edited Dec 13, 2023 by Romuald Atchadé