Job fails with "error during connect: server misbehaving" using Docker-in-Docker (dind) executor and own CA.
We're using Gitlab 11.5.4 (315df49) on Ubuntu 16.04.5 LTS and setup the GitLab Container Registry using a SSL certificate signed by our own (internal) Certificate Authority (CA). The server running Gitlab trusts this CA and there are no SSL issues on the host or any of the client machines connecting to this Gitlab instance.
Next we've setup a CI/CD pipeline using the official gitlab-runner:alpine image and using 'docker' as executer according to the official documentation. Additionally, we specified the tls-ca-file, tls-cert-file and tls-key-file of our custom CA and certificate as parameters when registering the runner, using the advanced runner documentation, like so;
docker run --rm -t -i -v /srv/gitlab-runner/config:/etc/gitlab-runner --name runner gitlab-runner:alpine register \
--non-interactive \
--executor "docker" \
--docker-image "docker:stable" \
--url "https://gitlab.infra.local/" \
--registration-token "xxxxx" \
--description "runner" \
--tag-list "" \
--run-untagged \
--locked="false" \
--docker-privileged \
--tls-ca-file "/etc/gitlab-runner/certs/runner/ca.pem" \
--tls-cert-file "/etc/gitlab-runner/certs/runner/cert.pem" \
--tls-key-file "/etc/gitlab-runner/certs/runner/key.pem"
Note: ca.pem contains both the Intermediate as the Root certificate of our own CA, i.e. the entire chain, as described in the docs.
The runner would register fine, but jobs would fail with the following error message when trying to push to the GitLab Container Registry: x509: certificate signed by unknown authority
. The gitlab-ci.yml used in this setup looks as follows:
image: docker:stable
variables:
CONTAINER_IMAGE: company:testimage
DOCKER_HOST: tcp://docker:2375
DOCKER_DRIVER: overlay2
services:
- docker:dind
before_script:
- docker info
- docker login -u gitlab-ci-token -p $CI_JOB_TOKEN gitlab.infra.local:4567
build:
stage: build
script:
- docker build -t $CONTAINER_IMAGE .
- docker push $CONTAINER_IMAGE
(Source: https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#use-docker-in-docker-executor )
We suspect this error was caused by either the docker:stable image or docker:dind service, which both (obviously) do not trust our internal CA out of the box. Therefore we tried copying our ca.crt to /etc/gitlab-runner/certs/ca.crt
as stated here: https://docs.gitlab.com/runner/install/docker.html#installing-trusted-ssl-server-certificates
However, that does not seem to have any effect; even after restarting the Docker service, the issue remains the same.
Therefore we came up with the idea to build our own 'custom' docker:stable and docker:dind images, based off the original ones, with the minor change that these do contain/trust our own internal CA. The Dockerfile used to build the docker:dind images looks like so;
#Download dind base image
FROM docker:dind
#Inject CA-certificates file from host
COPY Company_Intermediate.crt /usr/local/share/ca-certificates/Company_Intermediate.crt
COPY Company_Root.crt /usr/local/share/ca-certificates/Company_Root.crt
#Update CA-certificates
RUN /usr/sbin/update-ca-certificates --fresh
#Overwrite the topsecret undocumented cert.pem file used by libtls and libcrypto
RUN cp /etc/ssl/certs/ca-certificates.crt /etc/ssl/cert.pem
The docker image used to build docker:stable looks exactly the same, apart from the first FROM:
line obviously.
We verified that the custom Docker images work and trust our CA, by running a container using the images and then curl the URL of our Gitlab Container Repository. Previously this would fail with 'certificate signed by unknown authority' but now the index.html is sucesfully downloaded inside the container.
Continuing on, we then used/referenced these images in the gitlab-ci.yml instead of the official docker:stable and docker:dind image and now the jobs fail with a different error:
$ docker info
error during connect: Get http://docker:2375/v1.38/info: dial tcp: lookup docker on 10.5.102.12:53: server misbehaving
ERROR: Job failed: exit code 1
And now we're stuck. We have no idea what this error could be causing, as the images used in the gitlab-ci.yml have been tested and verified working manually. Additionally, all information I could find about this error points towards the direction of a DNS error, yet the machine running Gitlab, the Runner and the Container Repository are on the same (internal) machine. Also, DNS resolution works when running a manual 'curl' towards the Gitlab server in the container.
Any help is well appreciated. Other approaches in order to get Gitlab Runner using the Docker-in-Docker executor to work with our own internal CA, instead of using custom images, are also welcomed.
Thank you for your time!