Services don't work when network_mode set to a custom bridge network

One thing I did notice is that NetworkMode isn't set for the cache volume container. May not be relevant in this case, though:

diff --git a/executors/docker/executor_docker.go b/executors/docker/executor_docker.go
index 8cfcb4ce..a7d1c640 100644
--- a/executors/docker/executor_docker.go
+++ b/executors/docker/executor_docker.go
@@ -304,6 +304,7 @@ func (s *executor) createCacheVolume(containerName, containerPath string) (strin
        }

        hostConfig := &container.HostConfig{
+               NetworkMode: container.NetworkMode(s.Config.Docker.NetworkMode),
                LogConfig: container.LogConfig{
                        Type: "json-file",
                },

changed the description

It looks to me that the containers are being started in the right network (which you can see via docker network inspect <your network>, but for some reason the container responsible for checking that the container is up appears to think it has not been started.

Attached are the Docker daemon debug logs when a build is attempted. retry.txt

@stanhu AFAIR Docker link feature that we're using to connect containers is working only in the default network. I'm not 100% sure, but I think this is the reason. I need to investigate it deeper. If I'm right, then I think that the best solution would be to change link usage to using Docker Network to connect containers.

@tmaczukin Thanks. It sounds like the link feature is deprecated in any case: https://docs.docker.com/engine/userguide/networking/default_network/dockerlinks/

I did some tests, and I verified that in to make DNS resolution work with user-defined networks, we have to use the --name parameter in Docker. For example:

docker run --network test --name=nginx nginx &
docker run -it --network test ubuntu:16.04 bash

Does that mean we just need to use the Name parameter when we create a container instead of relying on the link mechanism?

added customer label

@stanhu Name needs to be unique on the host. That's why we generate unique names of containers that include project id, project name, service index in services map, concurrent worker index etc. This isn't something that would be useful for end user. However, there is an --alias option that does the same and could be used. Just notice that until we start properly supporting Docker Network, the alias should also be unique in one network, so it makes Runner usable until there is one job running at once.

@tmaczukin Ok, could we just use the --alias option for the service name now to solve this issue?

What exactly is required to properly support Docker Network?

@tmaczukin Can you comment on @stanhu's last suggestion, please?

added ~1672339 typebug labels

@stanhu @tmaczukin I think that --alias must be unique on the network, I'll check this. If not working we can try using

--add-host list Add a custom host-to-IP mapping (host:ip)

@nolith Alias doesn't need to be unique on network. If multiple containers will be started with the same alias in one network, then simply all three IPs will be available in /etc/hosts for that name.

@tmaczukin ok, but in our case will be a problem.

3 pipelines running a Postgres service will be mixed and your containers will end up connecting in round-robin.

That's why I wrote:

Just notice that until we start properly supporting Docker Network, the alias should also be unique in one network, so it makes Runner usable until there is one job running at once.

Does it mean that in such configuration can only support at most one concurrent build running on the specified network?

@ayufan Yes, it does if we use the --network-alias option. But this is the simplest fix that we can add to make Runner working with Docker and specified network (in a limited way, but still usable).

We could try to use --add-host proposed by @nolith, but with this solution I see two problems:

we start to manually manage the /etc/hosts configuration of IP:name mapping while Docker have a built-in solution,
the IP is available a while after the service is started; this would mean that we need to start each service, wait until we can get it IP and after we collect all IP:alias pairs we can start creating the build container (while Docker can do it "magically" in an asynchronus way).

We could try this, but I think the better solution is to switch from linking to docker networking with a possibility of adding a custom configuration of auto-created networks. But such solution is not simple and would not be released sooner than in next month.

mentioned in merge request !682 (closed)

@stanhu @dblessing I've created !682 (closed) with a PoC of a fix using --network-alias. It has limitations pointed above by @ayufan and @nolith but it allows user to use Docker executor with services and custom network. There is still WARNING: Service XYZ probably didn't start properly message showed in the job's trace (because of how services health-check is designed) but first let's check if this change helps the customer in any way.

After the pipeline for !682 (closed) will be finished, the compiled binary will be available from the S3 bucket.

mentioned in merge request !803 (closed)

Services don't work when network_mode set to a custom bridge network

Designs

Child items ...

Activity