Essentially what happens is the service container fails to start properly:
Running with gitlab-ci-multi-runner 9.5.0 (413da38) on stanhu2 Docker (a96c6255)Using Docker executor with image ubuntu:16.04 ...Starting service nginx:latest ...Pulling docker image nginx:latest ...Using docker image nginx:latest ID=sha256:b8efb18f159bd948486f18bd8940b56fd2298b438229f5bd2bcf4cedcf037448 for nginx service...Waiting for services to be up and running...*** WARNING: Service runner-a96c6255-project-146-concurrent-0-nginx-0 probably didn't start properly.exit code 1*********
It looks to me that the containers are being started in the right network (which you can see via docker network inspect <your network>, but for some reason the container responsible for checking that the container is up appears to think it has not been started.
Attached are the Docker daemon debug logs when a build is attempted.
retry.txt
@stanhu AFAIR Docker link feature that we're using to connect containers is working only in the default network. I'm not 100% sure, but I think this is the reason. I need to investigate it deeper. If I'm right, then I think that the best solution would be to change link usage to using Docker Network to connect containers.
I did some tests, and I verified that in to make DNS resolution work with user-defined networks, we have to use the --name parameter in Docker. For example:
docker run --network test --name=nginx nginx &docker run -it --network test ubuntu:16.04 bash
Does that mean we just need to use the Name parameter when we create a container instead of relying on the link mechanism?
@stanhuName needs to be unique on the host. That's why we generate unique names of containers that include project id, project name, service index in services map, concurrent worker index etc. This isn't something that would be useful for end user. However, there is an --alias option that does the same and could be used. Just notice that until we start properly supporting Docker Network, the alias should also be unique in one network, so it makes Runner usable until there is one job running at once.
@nolith Alias doesn't need to be unique on network. If multiple containers will be started with the same alias in one network, then simply all three IPs will be available in /etc/hosts for that name.
Just notice that until we start properly supporting Docker Network, the alias should also be unique in one network, so it makes Runner usable until there is one job running at once.
@ayufan Yes, it does if we use the --network-alias option. But this is the simplest fix that we can add to make Runner working with Docker and specified network (in a limited way, but still usable).
We could try to use --add-host proposed by @nolith, but with this solution I see two problems:
we start to manually manage the /etc/hosts configuration of IP:name mapping while Docker have a built-in solution,
the IP is available a while after the service is started; this would mean that we need to start each service, wait until we can get it IP and after we collect all IP:alias pairs we can start creating the build container (while Docker can do it "magically" in an asynchronus way).
We could try this, but I think the better solution is to switch from linking to docker networking with a possibility of adding a custom configuration of auto-created networks. But such solution is not simple and would not be released sooner than in next month.
@stanhu@dblessing I've created !682 (closed) with a PoC of a fix using --network-alias. It has limitations pointed above by @ayufan and @nolith but it allows user to use Docker executor with services and custom network. There is still WARNING: Service XYZ probably didn't start properly message showed in the job's trace (because of how services health-check is designed) but first let's check if this change helps the customer in any way.
After the pipeline for !682 (closed) will be finished, the compiled binary will be available from the S3 bucket.
I've created !803 (closed) that solves this issue by adding the links to the network config when attaching to a user-defined network (just like docker run does). The WARNING: Service XYZ probably didn't start properly message is avoided by passing explicit environment variables to the wait-for-service container instead of relying on Docker link environment variables (which are deprecated).
Is there any news on this issue? Without its resolution, is it even possible to use services (such as docker:dind) with custom networks? In my case, it's a custom overlay network dedicated to GitLab services on a Swarm cluster.
To those interested, I have published the two docker images below that use the features from the new MR !1569 (merged). Both alpine images. It works very well on my side. If you start using it, do not forget to move to the stable runner version once MR !1569 (merged) is merged.
If you want to run the docker:dind service, be aware that since docker 19.03 TLS is enable by default. Gitlab has documented what you should do to get it to work. My advice would be to also update docker that's running the runner, if applicable, to 19.03.
The service healthcheck can still complain, but it should not lead to failure. There is another issue #3984 (closed) to improve the service healthcheck.
Finally, I want to thank those who created the MR and contributed, most specifically @steve.exley @steveazz and @krotscheck. Hopefully MR !1569 (merged) will be merged any time soon and everyone can enjoy all your hard work.