Allow pipeline services configuration for marking services as critical
Problem to solve
In certain cases, a failure to start a service
should result in a job failure. Tying a service container failure to a build container failure is not currently possible.
The current healthcheck configuration for services
raises a warning when a service fails to start. There can be several case where a warning may be raised without an issue as described in the docs however in other cases it would be preferable for a service container failure to result in a build container's failure.
Target audience
- Devon, DevOps Engineer, https://design.gitlab.com/research/personas#persona-devon
Further details
One example of this problem is when our sast
job attempts to spin up a docker service for docker-in-docker execution. When the service fails to start, a warning is generated but the job continues to execute, eventually failing as it depends on the service but it was not clear the service failed to start.
Running with gitlab-runner 11.6.0 (f100a208)
on ci-runner-2 xwRbLsB8
Using Docker executor with image docker:stable ...
Starting service docker:stable-dind ...
Pulling docker image docker:stable-dind ...
Using docker image sha256:5b626cc3459ad077146e8aac1fbe25f7099d71c6765efd6552b9209ca7ea4dc1 for docker:stable-dind ...
Waiting for services to be up and running...
*** WARNING: Service runner-xwRbLsB8-project-26-concurrent-0-docker-0 probably didn't start properly.
Health check error:
ContainerStart: Error response from daemon: Cannot link to a non running container: /runner-xwRbLsB8-project-26-concurrent-0-docker-0 AS /runner-xwRbLsB8-project-26-concurrent-0-docker-0-wait-for-service/service (executor_docker.go:1321:0s)
Service container logs:
2019-02-21T15:57:51.502610505Z mount: permission denied (are you root?)
Proposal
Add a configuration option to the services
settings map to recognize a service as critical. Possible options: allow_failure: false (default: true)
or critical: true (default: false)
.
services:
- name: docker:stable-dind
allow_failure: false
I would argue that this should be the default behavior for all services, however this is a breaking change so I'm not sure if we would want to proceed with this modification.
What does success look like, and how can we measure that?
When defining a service in .gitlab-ci.yml
, if that service fails to start properly, the job should report as a failure.