per-build networking breaks DNS configuration for DinD
## Summary The [`per-build` networking mode](https://docs.gitlab.com/runner/executors/docker.html#networking) causes the DNS configuration of the host system to not be picked up by containers running inside of Docker-in-Docker (DinD). Docker falls back to hard-coded DNS servers `8.8.8.8` and `8.8.4.4`. This is a problem particularly in corporate/institutional networks where outgoing DNS traffic may be blocked, i.e. public DNS resolvers cannot be reached. As a result any `docker build` of a container image which requires network access (inside a `RUN` command) fails. ## Steps to reproduce _Pure explanation of the issue in Docker at the end of this section._ Repository consisting of the following `Dockerfile` and `.gitlab-ci.yml`. ```Dockerfile FROM busybox RUN cat /etc/resolv.conf RUN nslookup google.com || true RUN wget -O /google.com.html https://google.com/ ``` <details> <summary> .gitlab-ci.yml </summary> ```yml stages: - build build-image: stage: build image: docker:20.10 tags: - docker-privileged services: - docker:20.10-dind script: - docker build -t myimage . ``` </details> <details> <summary>The underlying problem, docker only</summary> The following example demonstrates the underlying issue by emulating some of the steps performed by GitLab Runner, specifically creating a custom docker network and connecting DinD to it: ```shell # Host resolv.conf, using a company-internal DNS resolver $ cat /etc/resolv.conf search corp.com nameserver 192.168.53.53 # Create per-build network $ docker network create test # Start the docker:dind container connected to the network $ docker run -d --name dind --net test --privileged docker:dind # DinD resolv.conf, using a forwarding resolver specific to the custom docker network `test` $ docker exec dind cat /etc/resolv.conf search corp.com nameserver 127.0.0.11 options ndots:0 # Name resolution/ping works $ docker exec dind ping -c 1 google.com PING google.com (142.250.185.174): 56 data bytes 64 bytes from 142.250.185.174: seq=0 ttl=111 time=10.567 ms --- google.com ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 10.567/10.567/10.567 ms # Container inside of DinD resolv.conf, falling back to hard-coded docker defaults $ docker exec dind docker run busybox cat /etc/resolv.conf Unable to find image 'busybox:latest' locally latest: Pulling from library/busybox aa2a8d90b84c: Pulling fs layer aa2a8d90b84c: Download complete aa2a8d90b84c: Pull complete Digest: sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb Status: Downloaded newer image for busybox:latest search corp.com options ndots:0 nameserver 8.8.8.8 nameserver 8.8.4.4 # Name resolution does not work as outgoing DNS traffic is blocked by the corporation's firewall $ docker exec -it dind docker run busybox ping -c 1 google.com ping: bad address 'google.com' # Tearing down... $ docker stop dind dind $ docker rm dind dind $ docker network remove test test ``` This effect is the result of the following: * Each custom network has a docker-embedded DNS resolver for resolving service names. Connected containers are configured with this resolver in resolv.conf. * A custom network is created because the `per-build` networking mode is enabled via the `FF_NETWORK_PER_BUILD` feature flag. * The resolver is available on `127.0.0.11` for each container connected to the custom network. This cannot be overwritten, i.e. `--dns 192.168.53.53` does not have any effect. * For child containers started by dind, the default behaviour of Docker applies for populating resolv.conf, as these are not connected to a custom network. * Docker uses the resolv.conf of the "host" (the dind container), stripping away any localhost nameservers (like `127.0.0.11`) * If no nameservers remain, Docker adds a hard-coded set of default nameservers (`8.8.8.8`, `8.8.4.4`) * The resulting list of nameservers is written to the resolv.conf of the child container This issue is known but somewhat stalled: [moby/moby#20037 (comment)](https://github.com/moby/moby/issues/20037#issuecomment-181659049). The workaround is to specify the DNS servers to use explicitly for child containers running inside of DinD. Either of the following solution fix the issue * Configure the DNS on each container which is started within DinD. ```shell $ docker run -d --name dind --net test --privileged docker:dind $ docker exec dind docker run --dns 141.52.3.3 --dns 129.13.64.5 busybox ping -c 1 google.com ``` * Configure the default DNS when starting `docker:dind` dockerd. ```shell $ docker run -d --name dind --net test --privileged docker:dind --dns 141.52.3.3 --dns 129.13.64.5 $ docker exec dind docker run busybox ping -c 1 google.com ``` </details> ## Actual behavior Building of the Docker image fails because the build containers running inside of DinD cannot fetch the required files from the internet as DNS names cannot be resolved. (Could also be any package fetching/installation). ## Expected behavior Image is built as the DNS configuration of the host is used inside the build containers. ## Relevant logs and/or screenshots <details> <summary> job log </summary> I omitted (`OMITTED`) log output related to #27686. ```txt Running with gitlab-runner 13.11.0 (7f7a4bb0) on pauls-test-runner-docker-privileged 3gq-aACs feature flags: FF_NETWORK_PER_BUILD:true Preparing the "docker" executor Using Docker executor with image docker:20.10 ... Starting service docker:20.10-dind ... Pulling docker image docker:20.10-dind ... Using docker image sha256:dc8c389414c80f3c6510d3690cd03c29fc99d66f58955f138248499a34186bfa for docker:20.10-dind with digest docker@sha256:87ed8e3a7b251eef42c2e4251f95ae3c5f8c4c0a64900f19cc532d0a42aa7107 ... Waiting for services to be up and running... *** WARNING: Service runner-3gq-aacs-project-25822-concurrent-0-070528995f596ee8-docker-0 probably didn't start properly. Health check error: service "runner-3gq-aacs-project-25822-concurrent-0-070528995f596ee8-docker-0-wait-for-service" timeout Health check container logs: Service container logs: OMITTED ********* Pulling docker image docker:20.10 ... Using docker image sha256:d2979b152a7d43f040c7aef88c4c83de4e545227622b1045adf6fe409293f803 for docker:20.10 with digest docker@sha256:062edd9c11cbdf94e7620d932857a356fa179eaa26a3cc352759e75f04729c49 ... Preparing environment Running on runner-3gq-aacs-project-25822-concurrent-0 via build-ci... Getting source from Git repository Fetching changes with git depth set to 50... Initialized empty Git repository in /builds/cy8791/dind-dns-test/.git/ Created fresh repository. Checking out 71bf4985 as main... Skipping Git submodules setup Executing "step_script" stage of the job script Using docker image sha256:d2979b152a7d43f040c7aef88c4c83de4e545227622b1045adf6fe409293f803 for docker:20.10 with digest docker@sha256:062edd9c11cbdf94e7620d932857a356fa179eaa26a3cc352759e75f04729c49 ... $ docker build -t myimage . Step 1/4 : FROM busybox latest: Pulling from library/busybox aa2a8d90b84c: Pulling fs layer aa2a8d90b84c: Verifying Checksum aa2a8d90b84c: Download complete aa2a8d90b84c: Pull complete Digest: sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb Status: Downloaded newer image for busybox:latest ---> c55b0f125dc6 Step 2/4 : RUN cat /etc/resolv.conf ---> Running in 7d3e7642c93f search corp.com options ndots:0 nameserver 8.8.8.8 nameserver 8.8.4.4 Removing intermediate container 7d3e7642c93f ---> eae2b70b7bcf Step 3/4 : RUN nslookup google.com || true ---> Running in 2303181946a4 ;; connection timed out; no servers could be reached Removing intermediate container 2303181946a4 ---> 242abe799a60 Step 4/4 : RUN wget -O /google.com.html https://google.com/ ---> Running in 40ebcfcd7de1 wget: bad address 'google.com' The command '/bin/sh -c wget -O /google.com.html https://google.com/' returned a non-zero code: 1 Cleaning up file based variables ERROR: Job failed: exit code 1 ``` </details> ## Environment description The custom-installed runner is executed on a host inside a network where outgoing DNS traffic is blocked. That means the DNS servers configured in the host's resolv.conf **must** be used for performing any DNS query. The runner uses the Docker executor in privileged mode so that Docker images can be built. Recent versions of GitLab Runner and Docker are installed. <details> <summary> config.toml contents </summary> ```toml concurrent = 2 check_interval = 0 [session_server] session_timeout = 1800 [[runners]] name = "REDACTED-docker-privileged" url = "https://REDACTED/" token = "REDACTED" executor = "docker" environment = ["DOCKER_DRIVER=overlay2", "DOCKER_TLS_CERTDIR=/certs"] [runners.custom_build_dir] [runners.cache] [runners.cache.s3] [runners.cache.gcs] [runners.cache.azure] [runners.feature_flags] FF_NETWORK_PER_BUILD = true [runners.docker] tls_verify = false image = "docker:latest" privileged = true disable_entrypoint_overwrite = false oom_kill_disable = false disable_cache = false volumes = ["/certs/client", "/cache"] pull_policy = ["always"] shm_size = 0 ``` </details> <details> <summary>`docker info` output</summary> ``` Client: Context: default Debug Mode: false Plugins: app: Docker App (Docker Inc., v0.9.1-beta3) buildx: Build with BuildKit (Docker Inc., v0.5.1-docker) scan: Docker Scan (Docker Inc.) Server: Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 5 Server Version: 20.10.6 Storage Driver: overlay2 Backing Filesystem: xfs Supports d_type: true Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc Default Runtime: runc Init Binary: docker-init containerd version: 05f951a3781f4f2c1911b05e61c160e9c30eaa8e runc version: 12644e614e25b05da6fd08a38ffa0cfe1903fdec init version: de40ad0 Security Options: seccomp Profile: default Kernel Version: 3.10.0-1160.25.1.el7.x86_64 Operating System: Red Hat Enterprise Linux OSType: linux Architecture: x86_64 CPUs: 2 Total Memory: 3.696GiB Name: build-ci ID: YQM6:ZWJI:UQ73:N5GM:K4JL:7PK5:M7CX:GWA4:RYGP:RHUF:O5YX:VPUI Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false ``` </details> ### Used GitLab Runner version ``` Version: 13.11.0 Git revision: 7f7a4bb0 Git branch: 13-11-stable GO version: go1.13.8 Built: 2021-04-20T17:02:28+0000 OS/Arch: linux/amd64 ``` ## Possible fixes To me no fix is apparent. 1. Ideally, the proper DNS servers would somehow be picked up automatically. This fix would have to occur in Docker/Moby. [moby/moby#20037 (comment)](https://github.com/moby/moby/issues/20037#issuecomment-181659049) 2. Alternatively, GitLab Runner could provide a mechanism to specify the DNS servers in `config.toml` which get picked up by `docker:dind` containers (and their child containers) running as services within CI jobs. Fixing this bug is not critical, as workarounds are available, and per-build networking is not the default (yet?). ### Workarounds 1. Require each .gitlab-ci.yml to specify DNS explicitly for the `docker:dind` service, i.e. specify a command `dockerd ... --dns 192.168.53.53`. This requires developers to know details of the network environment of the GitLab runners. 2. Provide a DinD service in the GitLab Runner via `config.toml` which is properly configured, i.e. has a command `dockerd ... --dns 192.168.53.53`. As the image is fixed in `config.toml`, there is no way for developers to specify a different version of the image in .gitlab-ci.yml. 3. Disable per-build networking for the GitLab Runner, i.e. remove the feature flag from `config.toml`. Then, passing on the host's nameservers through Docker's resolv.conf mechanism works: host → dind → child container **IMHO this is the preferably workaround as it is simple and preserves the separation of runner administration and developers.** ## References * Underlying issue in Moby [moby/moby#20037 (comment)](https://github.com/moby/moby/issues/20037#issuecomment-181659049) * More general issue concerning DNS nameserver detection by the Docker daemon [moby/moby#23910](https://github.com/moby/moby/issues/23910) * Docker documentation on DNS configuration https://docs.docker.com/config/containers/container-networking/#dns-services * Does not use per-build networking, but also concerns internal corporate networks #2201 * Does not use per-build networking, but also concerns DNS configuration #3054 and !892 * Same issue surfacing in Drone CI https://discourse.drone.io/t/dind-container-not-receiving-host-resolv-conf-settings/811 ## To do - [ ] Let's test the solution outlined in the comment threads to validate there is a viable solution for this problem.
issue