Kubernetes runner/executor plus Docker never recognizes that the job is finished
Zendesk: https://gitlab.zendesk.com/agent/tickets/98176
This is a really strange/complicated setup so I will try to explain what I've seen with the customer.
Customer deployed a runner into Kubernetes. Using Kubernetes executor, jobs are executed in another namespace. In this particular case the customer is trying to run a code climate job via Docker. .gitlab-ci.yml
as follows:
codequality:
stage: test
variables:
DOCKER_DRIVER: overlay2
services:
- docker:stable-dind
script:
- export SP_VERSION=$(echo "$CI_SERVER_VERSION" | sed 's/^[0-9]*[0−9]∗\.[0-9]*[0−9]∗.*/\1-\2-stable/')
- export DOCKER_HOST=tcp://localhost:2375
- docker run
--env SOURCE_CODE="$PWD"
--volume "$PWD":/code
--volume /var/run/docker.sock:/var/run/docker.sock
"registry.gitlab.com/gitlab-org/security-products/codequality:$SP_VERSION" /code
artifacts:
paths: [codeclimate.json]
tags:
- dind
We can see that the executor spins up the new Kubernetes container/pod and Docker launches the codequality container (which itself then spawns code climate container). The code climate container finishes and exits, as does the code quality container. All expected output from the run is sent back to GitLab successfully. However, GitLab never recognizes that the job is done.
It seems like there's some signal that is lost on the Kubernetes container/pod and it doesn't realize that the Docker containers have exited. I've asked the customer to check if the docker run
process still shows up on the Kubernetes container/pod in case something is hung there.
There is a full debug log from the runner in the ticket. However, I don't see anything useful there. We can see it start the Kubernetes command and append the trace but in the end it simply keeps trying to get output but there is no more, because the job is done.
Any others ideas? @tmaczukin @nolith