Job won’t complete: containers with unready status
Summary
Since upgrading the runner to v17.4.0, some jobs with extra service(s) will not complete if one of the service container is not running.
In our specific case, a docker
service is defined in the runner's Helm configuration and also in the job's definition. One of them will stop because of duplicated ports. In a different scenario, a job has a docker
service configured but is attempting to run on a non-privileged runner. The service container is started but immediately stopped.
Steps to reproduce
- Start a job with a service
- Ensure that the service is stopped before the end of the job
- Let the job complete successfully
Actual behavior
The job will wait for the stopped container to be in a "running" state before ending
Expected behavior
The job will stop with the "success" status
Relevant logs and/or screenshots
Environment description
We are using our own runners, deployed using Helm on an AKS cluster.
config.toml contents
[[runners]]
pre_build_script = REDACTED
environment = [
"DOCKER_HOST=REDACTED",
"DOCKER_TLS_CERTDIR=REDACTED",
]
[runners.kubernetes]
namespace = "{{.Release.Namespace}}"
pull_policy = ["if-not-present"]
image_pull_secrets = ["dockerhub"]
image = "ubuntu:20.10.17"
privileged = true
poll_timeout = 500
# The affinity definition below define a scheduling preference for job so they avoid running on system nodes.
# This block must be copied in every configuration as it is not currently possible to extract it for easier reuse.
[runners.kubernetes.affinity]
[runners.kubernetes.affinity.node_affinity]
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution]]
weight = 100
[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.label_selector]
[[runners.kubernetes.affinity.node_affinity.preferred_during_scheduling_ignored_during_execution.label_selector.match_expressions]]
key = "kubernetes.azure.com/mode"
operator = "In"
values = ["user"]
[[runners.kubernetes.services]]
name = "docker:dind"
command = ["--insecure-registry=REDACTED", "--registry-mirror=REDACTED"]
[runners.kubernetes.volumes]
[[runners.kubernetes.volumes.config_map]]
name = "runner-scripts"
mount_path = "REDACTED"
[runners.kubernetes.pod_labels]
axceta_job_id = "$CI_JOB_ID"
axceta_job_name = "$CI_JOB_NAME"
axceta_job_stage = "$CI_JOB_STAGE"
axceta_project_name = "$CI_PROJECT_NAME"
axceta_project_id = "$CI_PROJECT_ID"
axceta_pipeline_id = "$CI_PIPELINE_ID"
[[runners.kubernetes.volumes.empty_dir]]
name = "docker-certs"
REDACTED
[[runners.kubernetes.volumes.secret]]
name = "dockerhub"
REDACTED
[runners.cache]
Type = "azure"
Shared = true
[runners.cache.azure]
REDACTED
Used GitLab Runner version
Current workaround
Downgrade back to v17.3.1