Caching for docker-in-docker builds

Added ~19173 label

I'm not quite sure I understand. Are you rebuilding the image in multiple steps? I see the container registry as acting as your cache in this instance, so you'd build once and push to the registry so that you don't have to build it again in a future step.

Oh wait, maybe you mean it takes a long time to build on each commit. That's worth looking into further.

Thanks for mentioning the missing step in the example. I've pushed an update to the blog post. The test jobs should actually work as-is because docker run will download the image from the registry if it doesn't exist locally.

It takes a long time (40-50 minutes) to build a single commit. The Dockerfile starts with FROM php:5.6-apache, which it has to download every time (484 megabytes), then it compiles a few php extensions, installs node and npm, git, composer, phantomjs and selenium.

The image is meant for testing php apps, so it's pretty heavy. For example, it needs selenium because the acceptance tests need to access the website that's running in the build image, and I don't think that's possible to do with the current service linking support in gitlab-ci-multi-runner (the link is one-way).

You're right about the test jobs. I didn't try those.

Ouch, that's a pretty horrible experience!

New theory: Missing the intermediate layers on your local docker image means you're rebuilding everything from scratch every time. If you were to pull a previously built image first, then rebuild, you might be able to skip rebuilding the unchanged intermediate layers.

Using Docker compose may eventually help you by being able to break out selenium to a separate container.

Starting to feel like we need a (cached) docker daemon per project or possibly per group or something.

More thought needed here for sure.

I tried pulling a previous image, but docker still rebuilt everything from scratch . The only difference was that it didn't have to download the php image from docker hub.

Damn! Thanks for disproving my theory.

Just to be clear, if you rebuilt locally, it would re-use intermediate layers and build faster?

I just pulled the image on a computer where I never built that image before. The first build was from scratch. Subsequent builds are cached, and finish instantly. So it looks like the caching feature depends on some files that are local to the machine you're building on.

Milestone changed to %29

@nkovacs Thanks for confirming.

@ayufan Do you have any ideas how to solve this?

@markpundsack Yes, let's discuss this after we finish 8.9.

I have the same problem with the cache. Everytime I execute the build, the runner downloads everything and it's not cached on my local machine. This means that when I execute docker images on my machine (where the runner is running) I can not see the image that is created for the build.

The content of my .gitlab-ci.yml file:

image: docker:git
services:
- docker:dind

stages:
- build
- test

variables:
  CONTAINER_TEST_IMAGE: registry.example.com/ivan.lopez/ci-registry:$CI_BUILD_REF_NAME
  CONTAINER_RELEASE_IMAGE: registry.example.com/ivan.lopez/ci-registry:latest

before_script:
  - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN registry.example.com

build:
  stage: build
  script:
    - docker build -t $CONTAINER_TEST_IMAGE docker/tester
    - docker push $CONTAINER_TEST_IMAGE

test1:
  stage: test
  script:
    - docker run $CONTAINER_TEST_IMAGE ./gradlew demo1:test

test2:
  stage: test
  script:
    - docker run $CONTAINER_TEST_IMAGE ./gradlew demo2:test

And the content of the docker/tester/Dockerfile directory:

FROM ubuntu:xenial

RUN apt-get update && \
    apt-get install -yq locales ca-certificates wget sudo && \
    rm -rf /var/lib/apt/lists/*
RUN echo "en_US.UTF-8 UTF-8" > /etc/locale.gen
RUN locale-gen && update-locale LANG=en_US.UTF-8 LC_ALL=en_US.UTF-8 LANGUAGE=en
ENV LANG en_US.UTF-8
ENV LC_ALL en_US.UTF-8

RUN apt-get update -yq && \
    apt-get install -yq \
    openjdk-8-jdk

So everytime the runner downloads ubuntu image, installs everything and then push a new (the same) image to the registry.

Sorry, I forgot to include my config.toml file:

concurrent = 1

[[runners]]
  name = "nobita"
  url = "https://gitlab.example.com/ci"
  token = "741527cf79f503d8f826925c54a52d"
  executor = "docker"
  [runners.docker]
    tls_verify = false
    image = "alpine"
    privileged = true
    disable_cache = false
    volumes = ["/cache", "gradleCache:/root/.gradle"]
  [runners.cache]
    Insecure = false

And I start the runner with the command:

docker run -v `pwd`:/etc/gitlab-runner \
           -v /var/run/docker.sock:/var/run/docker.sock \
           gitlab/gitlab-runner:v1.2.0

Added ~122770 label

Milestone changed to %8.10

A solution may be not to run image build using dind (which means each local docker-daemon is new and without any previous data), but run it mounting the host docker.sock.

That would require a config.toml like :

concurrent = 1

[[runners]]
  name = "nobita"
  url = "https://gitlab.example.com/ci"
  token = "741527cf79f503d8f826925c54a52d"
  executor = "docker"
  [runners.docker]
    tls_verify = false
    image = "alpine"
    privileged = true
    disable_cache = false
    volumes = ["/cache", "gradleCache:/root/.gradle", "/var/run/docker.sock:/var/run/docker.sock"]
  [runners.cache]
    Insecure = false

(the privileged = true may not be required in that mode)

And job within .gitlab-ci.yml without the service docker:dind.

Just a theory, I did not had time to verify it for now.

@orobardet Yes, that might work, and we now document that option, but there are some serious drawbacks which stop us from using it for our shared runners on GitLab.com as discussed on #17769 (closed). So we still need to look at caching for docker-in-docker.

mentioned in issue #18920 (closed)

The cache of docker layers when using the socket bind-mounting solution is not working.

I run the gitlab-ci-multi-runner alongside gitlab-ee, and bind-mount the docker socket as describe in this issue.

Images are correctly accessed (i.e: "FROM" is instantaneous, and push/pull only pushes/pulls the missing layers), but when building an image, every layer gets built every time.

I am looking for the same thing: how to cache the dind builds. I have tested the suggestion above, about including the Docker socket bind, without success.

Using Jenkins with dind and bind-mouting socket, the build takes 60 seconds. For the same changes, using Gitlab runner with dind, it takes 30 min.

The state of optimising Docker Builds!

Preface

We offer two git strategies: git fetch and git clone. git clone always checkouts project from scratch removing all current git data if present.

This may collide with caching of docker layers if you ADD/COPY the files to docker image.

Let's consider this example:

FROM ubuntu
RUN apt-get install -y git-core
ADD / /app
RUN /app/setup

If you use git clone the layers after ADD will never be cached, even if files did not change.

If you use git fetch and project has pas git data it is possible that ADD layers will be cached and everything executed after them.

It's important to note that git fetch caching works only locally (the git data is stored locally), it doesn't work on automatically managed infrastracture.

Using shell executor

This is the best executor as for now to be used when you wan't to cache docker layers. It basically doesn't require any changes, other then adding gitlab-runner to docker group.

Given the git fetch it gives the best possibility of caching docker layers.

Using docker executor with bind mounted socket

The widely evaluated proposal is to expose docker engine to child containers: volumes = ["/var/run/docker.sock:/var/run/docker.sock"].

This also allows fairly good caching. The docker executor also supports git fetch strategy and allows to cache layers between.

Using bind mounted socket it can lead to concurrency problems when running multiple builds on single machine at the same time, because you are using the same docker engine. It can lead to name clashing.

Using docker executor with dind

services:
 - docker:dind

This approach doesn't have caching enabled by default. It's important to note that docker:dind by default uses vfs storage driver which offers best compatibility, but also is the slowest of available.

To make docker:dind reasonably you have to add DOCKER_DRIVER:

services:
- docker:dind

variables:
  DOCKER_DRIVER: overlay

This will make Docker (a dind instance) to use overlayfs which is one of the fastest docker storage driver. It will have big impact on performance when building docker images.

The docker:dind is the best solution in terms of conccurency, because every build has its own docker engine so it does not affect other running instances. This comes at cost of possibly longer build times.

~~Some way of layer caching can be achieved by pulling previously uploaded image to reause layers that do not depent on content from git registry:~~

In the near future we will extend dind with local, per-build cache to speed-up image building.

@LouisKottmann @averri

Can you post your build output, Dockerfile and you current config.toml configuration? I will help you with making it reuse layers.

Thanks for offering all the tips on optimizing builds!

On the topic of the storage driver, the Docker documentation states: "Many people consider OverlayFS as the future of the Docker storage driver. However, it is less mature, and potentially less stable than some of the more mature drivers such as aufs and devicemapper. For this reason, you should use the OverlayFS driver with caution and expect to encounter more bugs and nuances than if you were using a more mature driver."

Because of this, I'm a bit hesitant to choose overlay. Have you encountered any of the mentioned stability issues?

jpetazzo has written on the subject: we should not be using dind.

https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/

I think it's safe to say the best solution is docker executor + socket bind-mounting, but for some reason I only hit layer cache on a rebuild.

I simply installed docker inside my build image and bind-mounted the socket, maybe there is an additional step?

My config.toml looks like this:

concurrent = 5

[[runners]]
  name = "gitlab-runner-3"
  url = "https://git.xxxxxxxxx.com/ci"
  token = "xxxxxxxxxxxxxxxxx"
  executor = "docker"
  builds_dir = "/home/xxxxx/builds"
  [runners.docker]
    tls_verify = false
    image = "xxxxxx/build:16"
    privileged = true
    disable_cache = false
    volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
    allowed_images = ["xxxxxx/build:*"]
    allowed_services = ["xxxx:*", "xxxx:*"]
  [runners.cache]
    Insecure = false

We are using overlayfs for quite long time with dind and it works much better then aufs. I would not suggest using that for production servers (running applications), but for CI/CD solution we didn't encounter any problems.

I was using aufs in docker:dind, and just tried overlayfs has you suggested @ayufan: it halves the build time compared to aufs! Despite no caching for now for docker build -- looking forward to your solution about this (by the way, any clues on how you we'll do that, just to try it ourselves?). Thanks!

Thanks also for the DOCKER_DRIVER variable: I didn't know about it (can't find it on any docker documentation). I don't need anymore to extends docker:dind image to set the storage driver :)

For those who want to try docker:dind gitlab-ci build with overlayfs, don't forget to install and load overlay kernel module on the host of the runner.

Is DOCKER_DRIVER something that could be set in the runner itself so it doesn't have to litter each .gitlab-ci.yml?

@markpundsack

This has to be set in .gitlab-ci.yml currently.

My config.toml (inside runner container):

[[runners]]
  name = "docker-dind"
  url = "https://gitlab.com/ci"
  token = "xxx"
  executor = "docker"
  [runners.docker]
    tls_verify = false
    image = "maven:3.3.3"
    privileged = true
    disable_cache = false
    volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
  [runners.cache]
    Insecure = false

Now, the runner log:

gitlab-ci-multi-runner 1.3.2 (0323456)
Using Docker executor with image maven:3.3.3 ...
Pulling docker image docker:dind ...
Starting service docker:dind ...
Waiting for services to be up and running...

*** WARNING: Service runner-f088f62c-project-1358215-concurrent-0-docker probably didn't start properly.

service runner-f088f62c-project-1358215-concurrent-0-docker did timeout

2016-07-12T10:53:18.317354705Z time="2016-07-12T10:53:18.316825632Z" level=warning msg="/!\\ DON'T BIND ON ANY IP ADDRESS WITHOUT setting -tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING /!\\" 
2016-07-12T10:53:18.337413341Z time="2016-07-12T10:53:18.335061756Z" level=info msg="New containerd process, pid: 17\n" 
2016-07-12T10:53:19.345445073Z time="2016-07-12T10:53:19.344852419Z" level=error msg="'overlay' not found as a supported filesystem on this host. Please ensure kernel is new enough and has overlay support loaded." 
2016-07-12T10:53:19.345515108Z time="2016-07-12T10:53:19.345088407Z" level=fatal msg="Error starting daemon: error initializing graphdriver: driver not supported"

*********

Pulling docker image maven:3.3.3 ...
Running on runner-f088f62c-project-1358215-concurrent-0 via ec04b2394049...
Fetching changes...
HEAD is now at e806ae7 Update .gitlab-ci.yml
Checking out e806ae72 as master...
$ docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN registry.gitlab.com
/bin/bash: line 23: docker: command not found

ERROR: Build failed: exit code 1

Why it can't find docker?

@averri

Your docker image: maven:3.3.3 doesn't have docker binary.

Hi @ayufan, I have installed it, and it is possible to use from container:

[root@vmi77191 ~]# docker exec -it gitlab-runner bash
root@ec04b2394049:/# docker --version
Docker version 1.6.2, build 7c8fca2

@ayufan , thank you very much for your efforts. Now I am able to use dind, it is very fast now when building mutiple times.

.gitlab-ci.yml:

variables:
  IMAGE_NAME: registry.gitlab.com/mfalzetta/ocbweb
  DOCKER_DRIVER: overlay

build:
   image: docker:latest
   services:
   - docker:dind
   stage: build
   script:
     - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN registry.gitlab.com
     - docker build -t $IMAGE_NAME .
     - docker push $IMAGE_NAME

config.toml:

[[runners]]
  name = "my-runner"
  url = "https://gitlab.com/ci"
  token = "xxx"
  executor = "docker"
  [runners.docker]
    tls_verify = false
    image = "maven:3.3.3"
    privileged = true
    disable_cache = false
     volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
  [runners.cache]
    Insecure = true

The build output:

gitlab-ci-multi-runner 1.3.2 (0323456)
Using Docker executor with image docker:latest ...
Pulling docker image docker:dind ...
Starting service docker:dind ...
Waiting for services to be up and running...
Pulling docker image docker:latest ...
Running on runner-42d7a8ef-project-1358215-concurrent-0 via a5165f7eced1...
Cloning repository...
Cloning into '/builds/mfalzetta/ocbweb'...
Checking out 5c28da84 as master...
$ docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN registry.gitlab.com
Login Succeeded
$ docker build -t $IMAGE_NAME .
Sending build context to Docker daemon 557.1 kB
Sending build context to Docker daemon 89.51 MB

Step 1 : FROM reinblau/lamp:5.6
 ---> d6d7f7b85890
Step 2 : MAINTAINER Alexandre Verri "alexandreverri@gmail.com"
 ---> Using cache
 ---> 4a0deca0abbb
Step 3 : RUN apt-get update && apt-get install -y sendmail vim less wget     && rm -rf /var/lib/apt/lists/*     rm -rf /tmp/*
 ---> Using cache
 ---> df4ac619af90
Step 4 : RUN sendmailconfig
 ---> Using cache
 ---> 933a29f8bb68
Step 5 : RUN touch /var/log/mail.log && chown www-data /var/log/mail.log
 ---> Using cache
 ---> 06f41f4cae43
Step 6 : RUN mkdir -p /etc/php5/apache2
 ---> Using cache
 ---> 4f970758aef3
Step 7 : COPY php.ini /usr/local/etc/php/php.ini
 ---> ce03fa804d0c
Removing intermediate container 19aaf1b023e9
Step 8 : ADD src /var/www/html
 ---> 20501691413e
Removing intermediate container 92f0166f4c94
Successfully built 20501691413e
$ docker push $IMAGE_NAME
The push refers to a repository [registry.gitlab.com/mfalzetta/ocbweb]
77ddfba4c70a: Preparing
8aa6a3a390bf: Preparing
5bb9a44ba4c8: Preparing
15f77486b406: Preparing
b696bd46fdc1: Preparing
548c8b3e1c89: Preparing
68312c774294: Preparing
e4fece802b87: Preparing
00c14228f035: Preparing
7d750e93d8da: Preparing
211c9e12b379: Preparing
6463c4d5d10c: Preparing
d4dd12e773db: Preparing
f9d293ba89aa: Preparing
7cdb8b5607d7: Preparing
e19464c9d22d: Preparing
a0230508c277: Preparing
f1c36d11715d: Preparing
2f141c453bd9: Preparing
ee63decf49d1: Preparing
42755cf4ee95: Preparing
548c8b3e1c89: Waiting
68312c774294: Waiting
e4fece802b87: Waiting
6463c4d5d10c: Waiting
d4dd12e773db: Waiting
00c14228f035: Waiting
f9d293ba89aa: Waiting
7cdb8b5607d7: Waiting
7d750e93d8da: Waiting
e19464c9d22d: Waiting
211c9e12b379: Waiting
a0230508c277: Waiting
f1c36d11715d: Waiting
2f141c453bd9: Waiting
ee63decf49d1: Waiting
42755cf4ee95: Waiting
77ddfba4c70a: Retrying in 5 seconds
77ddfba4c70a: Retrying in 4 seconds
77ddfba4c70a: Retrying in 3 seconds
15f77486b406: Pushed
5bb9a44ba4c8: Pushed
77ddfba4c70a: Retrying in 2 seconds
b696bd46fdc1: Pushed
8aa6a3a390bf: Pushed
77ddfba4c70a: Retrying in 1 second
68312c774294: Pushed
e4fece802b87: Pushed
7d750e93d8da: Pushed
211c9e12b379: Pushed
6463c4d5d10c: Pushed
f9d293ba89aa: Pushed
7cdb8b5607d7: Pushed
e19464c9d22d: Pushed
a0230508c277: Pushed
f1c36d11715d: Layer already exists
2f141c453bd9: Layer already exists
ee63decf49d1: Layer already exists
77ddfba4c70a: Pushed
42755cf4ee95: Layer already exists
548c8b3e1c89: Pushed
d4dd12e773db: Pushed
00c14228f035: Pushed
latest: digest: sha256:44431e0f37e45bffd57126dcada3013a27b378f709cbfa706bf7f3e0fdea0139 size: 4703

Build succeeded

I guess that you should either use docker:dind with DOCKER_DRIVER or use /var/run/docker.sock:/var/run/docker.sock. This are two exclusive methods, there's no point of using both of them :)

Anyway it's good that it did help you.

Milestone changed to %29

mentioned in issue #19920 (closed)

Sorry if I'm late to the party, but if I understand correctly, for someone using the default Gitlab.com shared runners, I essentially have to use dind, but that also prevents me from caching intermediate layers, which may come in a future release?

Using the shared runners on GitLab.com, you have to use dind, and that prevents you from using Docker's built in caching mechanisms. We have not yet found a way to work around that.

Actually, you can use an image with the docker daemon in it instead of having it as a service, and then you can cache /var/lib/docker. Just create a new image FROM docker:dind and in your entrypoint, start dockerd-entrypoint.sh in the background and a shell in the foreground for the runner.

@nkovacs Can you expand on that a little bit? How would the runner be able to cache the intermediate layers in that case?

The runner won't cache /var/lib/docker because it's outside the build directory. It's also a volume in the docker:dind image, so I duplicated docker:dind to get rid of the volume, but in the end I didn't need that (I wasn't able to symlink it as a volume). The entrypoint of my custom image is this: https://gitlab.com/nkovacs/docker-builder/blob/master/builder-entrypoint.sh

Its Dockerfile can be simplified to this: https://gitlab.com/nkovacs/docker-builder/blob/f2b30112799080410db43ce37d97e964936dde3e/Dockerfile, but right now as I said it's duplicating most of docker:dind.

In the end I just had my build script copy all files in /var/lib/docker to and from .cache/docker in the build directory: https://gitlab.com/nkovacs/docker_test/blob/master/.gitlab-ci.yml. Not the best thing to do with the docker daemon already running, but it works: https://gitlab.com/nkovacs/docker_test/builds/2451000 (note the "Using cache")

It seems like your docker_test repo doesnt exist anymore?

Sorry, it was private

This looks awesome! I would definitely recommend to put this somewhere in the docs, it is really helpful and I imagine a showstopper to a lot of other folks who are evaluating moving their CI to Gitlab

@ayufan here we go:

the build command:

docker build \
       --rm \
       --tag "url/app:tag" \
       .

the build output:

Step 1 : FROM my.registry.com/base:xx
 ---> ddcc4303e003
Step 2 : ENV ...
 ---> Running in f8450378b42a
 ---> fb5e46a43665
Removing intermediate container f8450378b42a
Step 3 : USER root
 ---> Running in a731d372dc7b
 ---> fee5e7be2ed1
Removing intermediate container a731d372dc7b
Step 4 : ENV BUILT_AT 2016-07-25
 ---> Running in b07c13c07e54
 ---> e65ed7dd7930
Removing intermediate container b07c13c07e54
Step 5 : RUN apk --update upgrade     && rm -rf /var/cache/apk/*
 ---> Running in 04b1d7682d18
...
OK: 502 MiB in 101 packages
 ---> 60bfc989ae56
Removing intermediate container 04b1d7682d18
Step 6 : ADD . /home/xxx/app
 ---> 1316c21080a2
...

the Dockerfile:

FROM my.registry.com/base:xx

ENV ...

USER root
ENV BUILT_AT=2016-07-25
RUN apk --update upgrade \
    && rm -rf /var/cache/apk/*

ADD . /home/xxx/app
...

the config.toml:

concurrent = 5

[[runners]]
  name = "gitlab-runner-3"
  url = "https://git.xxxxx.com/ci"
  token = "xxxxxxxxxxxxxxx"
  executor = "docker"
  builds_dir = "/home/xxx/builds"
  [runners.docker]
    tls_verify = false
    image = "xxx/build:16"
    privileged = false
    disable_cache = false
    volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
  [runners.cache]
    Insecure = false

https://gitlab.com/gitlab-org/gitlab-ce/issues/17861#note_13149597

Sending build context to Docker daemon 5.334 MB

i suggest setup dockerignore for .cache there

@nkovacs

#17861 (comment 13149597)

I'm getting a weird error upon trying your proposed solution:

cp: can't stat '.cache/docker/*': No such file or directory

I tried appending the pwdbut I still get:

cp: can't stat '/path/to/pwd/.cache/docker/*': No such file or directory

Any ideas as to what the "can't stat" means?

In my yml file, I have the following before_script settings configured for that job:

before_script:
  - "mkdir -pv `pwd`/.cache/docker"
  - "mkdir -pv /var/lib/docker"
  - "cp -a `pwd`/.cache/docker/* /var/lib/docker || :"

I do notice that the output omits creating the .cache dir:

$ mkdir -pv `pwd`/.cache/docker
$ mkdir -pv /var/lib/docker
created directory: '/var/lib/docker'
$ cp -a `pwd`/.cache/docker/* /var/lib/docker || :
cp: can't stat '/path/to/pwd/.cache/docker/*': No such file or directory

Should I try -f or some other flag?

Edit:

I forgot to mention that, when the after_script runs, I get this:

Running after script...
$ cp -a /var/lib/docker/* `pwd`/.cache/docker/ || :
$ docker images
cp: can't stat '/var/lib/docker/*': No such file or directory

For context, here's what my after_script looks like for the job:

after_script: #reset global after_script
  - "cp -a /var/lib/docker/* `pwd`/.cache/docker/ || :"
  - "docker images"

From the previous log, we know that /var/lib/docker already exists.

I couldn't find anything about 'stat' before because I was being very specific... I found a source that dealt with a slightly similar problem, and their solution was to wait or sleep for a bit. Since I'm not that much of a UNIX expert (and don't have much time to allot to the problem at hand), I used sleep 3 to wait for the completion of the creation of the directories.

I still have the same error message, though:

...
$ mkdir -pv `pwd`/.cache/docker
created directory: '/path/from/pwd/.cache/'
created directory: '/path/from/pwd/.cache/docker'
$ sleep 3
$ mkdir -pv /var/lib/docker
created directory: '/var/lib/docker'
$ cp -a `pwd`/.cache/docker/* /var/lib/docker || :
cp: can't stat '/path/from/pwd/.cache/docker/*': No such file or directory
$ docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $GL
Login Succeeded
...

Also, I tried different flags, like -a, -r, ... though I've yet to try to force with -rf.

It means the files don't exist, so it can't copy them.

We use the technique describe here : http://blog.runnable.com/post/145362675491/distributing-docker-cache-across-hosts

And our job looks like:

build:
  stage: build
  cache:
    key: "builder"
    paths:
      - ./.build
  script:
    - mkdir -p ./.build
    - docker load < ./.build/image.tar || true

    - docker build --pull --build-arg GIT_SHA=$CI_BUILD_REF -t $IMAGE_PATH_TEST .
    - docker push $IMAGE_PATH_TEST

    - docker save $IMAGE_PATH_TEST $(docker history -q $IMAGE_PATH_TEST | grep -v \<missing\> | tr '\n' ' ') > ./.build/image.tar

We had a mixinized instruction that busted the cache and that was our problem. Cache works when bind-mounting the socket for us.

@LoicMahieu thanks for sharing!

Docker 1.13.0 will solve the issue I think... See https://github.com/docker/docker/pull/26839

A simple way to handle it, at least for runners close to or inside the gitlab-ce node is to enable Docker registry's proxy cache! This would indeed only work for users using a private registry, but would still be helpful to many. See https://blog.docker.com/2015/10/registry-proxy-cache-docker-open-source/

The problem though is that there is no way (documented at least) to configure the registry coming with Omnibus... I posted an issue about it at omnibus-gitlab#1655 (closed)

Added ~113220 label

Docker 1.13.0 will solve the issue I think... See https://github.com/docker/docker/pull/26839

@LoicMahieu how does it solve the issue?

@gajus We will able to docker pull foo && docker build --cache-from foo bar. It will be easier to handle cache since it will not require a intermediate cache store and dealing with docker save/load. That's what I hope.... I didn't try docker 1.13 yet.

i tried docker 1.13 --cache-from, and it seems working nice. however the image(s) to use cache from need to be pulled manually:

variables:
  CONTAINER_TEST_IMAGE: $CI_REGISTRY_IMAGE:$CI_BUILD_REF
  CONTAINER_NAMED_IMAGE: $CI_REGISTRY_IMAGE:$CI_BUILD_REF_NAME
  CONTAINER_TAG_IMAGE: $CI_REGISTRY_IMAGE:$CI_BUILD_TAG
  CONTAINER_RELEASE_IMAGE: $CI_REGISTRY_IMAGE:latest

build:
  script: |
    docker pull $CONTAINER_NAMED_IMAGE || :
    docker pull $CONTAINER_RELEASE_IMAGE || :
    docker build --pull -t $CONTAINER_TEST_IMAGE --cache-from $CONTAINER_NAMED_IMAGE --cache-from $CONTAINER_RELEASE_IMAGE .
    docker push $CONTAINER_TEST_IMAGE

@glensc Awesome, looks great! Thanks for sharing! Can't wait 1.13 !

@grzesiek

Take a look at --cache-from thing :)

@ayufan Why me?

Side note: --cache-from looks nice, but we already have some other docker-in-docker caching mechanisms, see what we do in https://gitlab.com/gitlab-org/gitlab-qa/blob/master/bin/docker

@grzesiek the --cache-from is possibly "smarter" if it could do pull itself; i.e it can decide which images to pull. but current implementation has no difference, you have to pull each --cache-from argument so docker build could use them.

however, it's cleaner solution, just add 1+N pull arguments and 1+N --cache-from arguments, no need for extra scripts

@grzesiek

Then, you don't need all this load/save maihen :)

@ayufan this contribution in GitLab QA comes from @tmaczukin

@grzesiek Yes, and it was a nasty workaround for this problem . When 1.13 will be out we need to test --cache-from and switch to the native solution :)

Hi i'm having a big issue with this workflow, that i don't know how to workaround. I needed to setup a CI with 3 jobs (compile code, build image, deploy image), after reading this issue i chose to go with the docker.sock mount option (to temporarily solve the caching issue). This is my configuration:

build_code:
  image: compilerimage
  type: build_code
  cache:
    key: "$CI_PROJECT_ID"
    paths:
      - node_modules/
  artifacts:
    expire_in: 15 min
    paths:
    - dist/
  script:
    - yarn build

build_image:
  type: build_image
  script:
    - docker build -t myimage .

deploy:
  type: deploy
  variables:
    GIT_STRATEGY: none
  script:
    - docker push myimage

The fact is that i noticed that my artifacts aren't copied inside the docker image, i think it's because the docker build is executed on the host machine and doesn't has access to the artifacts folder, am I right? Could you think to a way to workaround this?

I seem to be getting good results by using Docker 1.13 as well, as noted by @LoicMahieu (thanks!).

image: docker:1.13-rc

services:
  - docker:1.13-rc-dind

docker pull my-image
docker build --pull --cache-from my-image -t my-image .

To speed things up a bit more, I've set cache.untracked: true in .gitlab-ci.yml:

cache:
  untracked: true

... in combination with something like this in my build script:

# let's save the "vendor" directory to the Gitlab CI cache
docker create --name cachecontainer my-image
docker cp cachecontainer:/var/www/vendor $CI_PROJECT_DIR/vendor
docker rm cachecontainer

Went from 26 min build time to 5 min.

mentioned in issue #26135 (closed)

mentioned in issue tvaughan/docker-flask-starterkit#6

I had good success with the following .gitlab-ci.yml cutting down build time of a simple image from ~3.5 mins to 1 min:

Creates a dir that will be cached
Logs in to Gitlab registry
Tries to load the image from the local cache folder, if it can't find the file it will pull it from the registry (assuming downloading the image will be faster than building it fully ;) )
Builds the new image using the cached copy
Pushes it back to the registry
Saves the image to the cache

image: docker:latest

variables:

        DOCKER_DRIVER: overlay
        CONTAINER_RELEASE_IMAGE: $CI_REGISTRY_IMAGE:latest

cache:
    key: ${CI_BUILD_REF_NAME}
    paths:
        - images/
    untracked: true

stages:
    - build

before_script:
    - docker info

build:
    stage: build
    script:
        - mkdir -p $CI_PROJECT_DIR/images/
        - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN registry.gitlab.com
        - docker load -i $CI_PROJECT_DIR/images/latest.tar || docker pull $CONTAINER_RELEASE_IMAGE || true
        - docker build --cache-from $CONTAINER_RELEASE_IMAGE -t $CONTAINER_RELEASE_IMAGE .
        - docker push $CONTAINER_RELEASE_IMAGE
        - docker save -o $CI_PROJECT_DIR/images/latest.tar $CONTAINER_RELEASE_IMAGE

/etc/gitlab-runner/config.toml

[[runners]]
  name = "Runner"
  url = "https://gitlab.com/ci"
  token = "***"
  executor = "docker"
  [runners.docker]
    tls_verify = false
    image = "docker:latest"
    privileged = true
    disable_cache = false
    volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
  [runners.cache]
    Insecure = false

.dockerignore

*

This stops the build copying all the files in the current directory to the docker daemon, but you might want this so be careful.

This is also the approach we applied in GitLab QA, see https://gitlab.com/gitlab-org/gitlab-qa/blob/master/bin/docker + .gitlab-ci.yml about how to use cache to pass docker layers between pipelines.

I have written an article about using --cache-from Docker option to enable layer cache in DIND.

https://medium.com/@gajus/making-docker-in-docker-builds-x2-faster-using-docker-cache-from-option-c01febd8ef84#.wcnx9woik

Article includes an example of .gitlab-ci.yml configuration.

@gajus you have typo in that blog: -tag used twice, but you intended to have two --cache-from. also for me default driver is aufs2 not vfs, so it depends really which kernel is available. and you probably should link back to this issue :) https://gitlab.com/gitlab-org/gitlab-ce/issues/17861#note_19140733

you have typo in that blog: -tag used twice

Fixed!

Regarding:

also for me default driver is aufs2 not vfs

In that case, there is an error here https://docs.gitlab.com/ce/ci/docker/using_docker_build.html#use-docker-in-docker-executor

By default, docker:dind uses --storage-driver vfs which is the slowest form offered. To use a different driver, see Using the overlayfs driver.

and here https://hub.docker.com/_/docker/:

IMPORTANT: this image defaults to --storage-driver=vfs

Whats the reason for wanting to link to this issue, though?

ah, there may be difference what is /usr/bin/docker default and in docker:dind image default.

added ~1464954 label

Maybe i'm missing something but even with --cache-from the base image is pulled each time, is this the correct behaviour?

I would like to enable the registry pull through cache feature but i saw that is still a WIP.

I am in the same boat as @SharpEdgeMarshall . Whats the point of --cache-from if docker pull will pull all the layers from the network anyway?

$ docker pull $CONTAINER_IMAGE:latest || true
latest: Pulling from applaudience/imdb-title-searcher
5d2f2591d62d: Pulling fs layer
b5d28f8aaa84: Pulling fs layer
b1ff00d718e6: Pulling fs layer
089fa812ca33: Pulling fs layer
3d9bca52e326: Pulling fs layer
743e641164cc: Pulling fs layer
049500fbf868: Pulling fs layer
a6bca6bb3fa5: Pulling fs layer
a97f29cc7e55: Pulling fs layer
69c7e7cad429: Pulling fs layer
41a22b3949e2: Pulling fs layer
8ce915bc1eec: Pulling fs layer
089fa812ca33: Waiting
3d9bca52e326: Waiting
743e641164cc: Waiting
049500fbf868: Waiting
a6bca6bb3fa5: Waiting
a97f29cc7e55: Waiting
69c7e7cad429: Waiting
41a22b3949e2: Waiting
8ce915bc1eec: Waiting
b5d28f8aaa84: Download complete
b1ff00d718e6: Verifying Checksum
b1ff00d718e6: Download complete
3d9bca52e326: Verifying Checksum
3d9bca52e326: Download complete
743e641164cc: Download complete
5d2f2591d62d: Verifying Checksum
5d2f2591d62d: Download complete
a6bca6bb3fa5: Verifying Checksum
a6bca6bb3fa5: Download complete
049500fbf868: Verifying Checksum
049500fbf868: Download complete
69c7e7cad429: Verifying Checksum
69c7e7cad429: Download complete
a97f29cc7e55: Verifying Checksum
a97f29cc7e55: Download complete
8ce915bc1eec: Verifying Checksum
8ce915bc1eec: Download complete
41a22b3949e2: Verifying Checksum
41a22b3949e2: Download complete
089fa812ca33: Verifying Checksum
089fa812ca33: Download complete
5d2f2591d62d: Pull complete
b5d28f8aaa84: Pull complete
b1ff00d718e6: Pull complete
089fa812ca33: Pull complete
3d9bca52e326: Pull complete
743e641164cc: Pull complete
049500fbf868: Pull complete
a6bca6bb3fa5: Pull complete
a97f29cc7e55: Pull complete
69c7e7cad429: Pull complete
41a22b3949e2: Pull complete
8ce915bc1eec: Pull complete
Digest: sha256:2c33aa9790d5b43626eb7746351b663768e0f712866b43cd5867a9f058856c74
Status: Downloaded newer image for registry.anuary.com/applaudience/imdb-title-searcher:latest
$ docker build --cache-from $CONTAINER_IMAGE:latest -t $CONTAINER_IMAGE:$CI_BUILD_REF -t $CONTAINER_IMAGE:latest .

The point is that it won't have to rebuild everything.

Whats the point of --cache-from if docker pull will pull all the layers from the network anyway?

At this point the biggest benefit is probably that using --cache-from reduces the build time significantly.

We managed to get our builds down to about 5 minutes using devicemapper as a storage driver on the host docker instance and overlay or aufs on the docker-in-docker instance (instead of vfs). See https://gitlab.com/gitlab-org/gitlab-ci-multi-runner/issues/1698#note_15746725

Btw, the documentation doesn't mention that your host docker must use devicemapper: https://docs.gitlab.com/ce/ci/docker/using_docker_build.html#using-the-overlayfs-driver

I think --cache-from solves this issue. We're not using it though because our builds are now sufficiently fast with devicemapper and not using a cache allows us to rerun the build to get updates without having to change anything in the dockerfile.

@nkovacs Beware of devicemapper on heavy docker use, like with runners: you may run out of inode. It depends of your kernel version, but if you have at least a 4.x, it would be better to use overlay2 on overlay2. This will result in a slilghtly better performance, et no inode issues in our case for monthes.

Beside the right choice of storage drivers, using --cache-from can reduce build time. As an exemple, a job in vfs that take ~15 min to build:

vfs on aufs/devmapper ->overlay on overlay/aufs ==> reduced to 5/6 mins
using --cache-from ==> reduced to ~2 min and even a few seconds depending on you Dockerfile the layers that changed

But I thought you couldn't use overlay on overlay. https://docs.docker.com/engine/userguide/storagedriver/selectadriver/#a-pluggable-storage-driver-architecture

Edit: well, apparently you can. Not sure why the documentation says it's disabled on overlay.

This requires that you push the image to the registry. I don't want to do that until all steps have passed.

Yes its disabled on overlay. overlay or overlay2 on overlay does not works. But overlay2 on overlay2 works :) May depends of the exact version of the kernel, though. We are using 4.4 on our runner servers.

In the worst case, with a kernel 3.x (>= 3.18), aufs on overlay is a better choice than overlay on devicemappper. Same performance, but much more inode-friendly for your filesystem.

I tried it on kernel 4.8, and overlay on overlay worked, as well as overlay2 on overlay.

Right now we're using aufs on devicemapper (direct-lvm), because we didn't see any performance improvements with overlay/overlay2 on devicemapper. Our kernel is 4.9.

What is this inode problem? I know overlay has an inode exhaustion problem (which is fixed in overlay2), but I can't find anything about devicemapper. Or is it just when using overlay on devicemapper?

Maybe i didn't explain correctly, i'm not talking about the needs to pull previous image each build, but the missing base image from the pulled one

CI Steps:

Pull the previous image from Gitlab Registry => FAST
Launch build with --cache-from previous-image
Pull Base Image from Docker Hub => SLOW (This is not included in the image just pulled)
Use --cache-from previous-image to skip other layers => FAST

$ docker pull $CI_REGISTRY_IMAGE:latest || true
latest: Pulling from my-image
12a7970a6783: Pulling fs layer
e4c54fc88e3f: Pulling fs layer
d60b59e6d2fe: Pulling fs layer
3402f548c0cc: Pulling fs layer
7ec1ae44bfdd: Pulling fs layer
8f2285bd7c05: Pulling fs layer
c93b04818c35: Pulling fs layer
0e4cdf883a97: Pulling fs layer
b8e337736d81: Pulling fs layer
7ec1ae44bfdd: Waiting
3402f548c0cc: Waiting
8f2285bd7c05: Waiting
c93b04818c35: Waiting
0e4cdf883a97: Waiting
b8e337736d81: Waiting
d60b59e6d2fe: Verifying Checksum
d60b59e6d2fe: Download complete
12a7970a6783: Verifying Checksum
12a7970a6783: Download complete
7ec1ae44bfdd: Verifying Checksum
7ec1ae44bfdd: Download complete
8f2285bd7c05: Verifying Checksum
8f2285bd7c05: Download complete
3402f548c0cc: Verifying Checksum
3402f548c0cc: Download complete
0e4cdf883a97: Verifying Checksum
0e4cdf883a97: Download complete
b8e337736d81: Verifying Checksum
b8e337736d81: Download complete
12a7970a6783: Pull complete
e4c54fc88e3f: Verifying Checksum
e4c54fc88e3f: Download complete
c93b04818c35: Verifying Checksum
c93b04818c35: Download complete
e4c54fc88e3f: Pull complete
d60b59e6d2fe: Pull complete
3402f548c0cc: Pull complete
7ec1ae44bfdd: Pull complete
8f2285bd7c05: Pull complete
c93b04818c35: Pull complete
0e4cdf883a97: Pull complete
b8e337736d81: Pull complete
Digest: sha256:89311473135417155757eb8b4e1552e3b2800d0176a2c6f2b7e2c700099e9bb9
Status: Downloaded newer image for my-registry/my-image:latest
$ docker build --cache-from $CI_REGISTRY_IMAGE:latest -t $CI_REGISTRY_IMAGE:${CI_BUILD_TAG:-"latest"} .
Sending build context to Docker daemon 607.7 kB

Step 1/15 : FROM node:6.10-alpine
6.10-alpine: Pulling from library/node
709515475419: Pulling fs layer
92aa7b64772e: Pulling fs layer
a32501522b97: Pulling fs layer
a32501522b97: Verifying Checksum
a32501522b97: Download complete
709515475419: Download complete
709515475419: Pull complete
92aa7b64772e: Verifying Checksum
92aa7b64772e: Download complete
92aa7b64772e: Pull complete
a32501522b97: Pull complete
Digest: sha256:4c028de86910dbd98269dee7ca7e81880bbf93c5f1d6fce7723198699da246c4
Status: Downloaded newer image for node:6.10-alpine
 ---> 8232a8b9c483
Step 2/15 : my-cmd
 ---> Using cache
Step .../15
---> Using cache

This assumes each build is the same. That's almost never true in my case. What I could do is specify --cache-from=ubuntu. That would help, but not a lot (percentage-wise).

@SharpEdgeMarshall This is strange. Is your previous image $CI_REGISTRY_IMAGE:latest using the same FROM (node:6.10-alpine) as the one you are building? If so, docker should not have to repull every layers. In the FROM step, docker should try to pull de FROM image from the hub, but quickly print "already exists" for each layer.

Like this:

$ docker pull $CI_REGISTRY_IMAGE:master || true
master: Pulling from build-images/ke-toolchain
4d624d25c331: Pulling fs layer
53add7efa609: Pulling fs layer
ca25b3253d5a: Pulling fs layer
... blabla
Status: Downloaded newer image for sfy-search_registry_build.af.multis.p.fti.net/ke/ke-toolchain:latest
$ docker build -t $AF_REGISTRY_IMAGE:$TAG --cache-from $CI_REGISTRY_IMAGE:master .
Sending build context to Docker daemon 94.21 kB

Step 1/22 : FROM vdc-os_registry.af.multis.p.fti.net/hebex/ubuntu:14.04
14.04: Pulling from hebex/ubuntu
4d624d25c331: Already exists
53add7efa609: Already exists
ca25b3253d5a: Already exists
22ef1499aafd: Already exists
0ef5cd16abf5: Already exists
919c7b75103c: Already exists
709173153c2e: Already exists
756a37e07cfb: Already exists
79b2019a2395: Already exists
9ca7eb0816d8: Already exists
Digest: sha256:b8099f0b1264433d0f2d8eb515c17663d529c43ca025223906f7184486a81e4a
Status: Downloaded newer image for vdc-os_registry.af.multis.p.fti.net/hebex/ubuntu:14.04
 ---> 26cc81e181f6

And this is not related to the use of --cache-from or not.

@orobardet Yes the FROM image is the same since many builds ago... And I can confirm is the same because the cache on the successive steps is not invalidated.

This is my gitlab-ci.yml

  - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY
  - docker pull $CI_REGISTRY_IMAGE:latest || true
  - docker build --cache-from $CI_REGISTRY_IMAGE:latest -t $CI_REGISTRY_IMAGE:${CI_BUILD_TAG:-"latest"} .
  - docker push $CI_REGISTRY_IMAGE:${CI_BUILD_TAG:-"latest"}

Maybe because i'm tagging the output equal to the --cache-from image?

N.B. i'm not using docker.sock mount but pure dind

Opened issue on Docker https://github.com/docker/docker/issues/31613

mentioned in issue #29184 (moved)

added ~1071782 label

Hey everyone, I've read all your creative solutions to getting docker going with cache in gitlab's hosted-ci. Interesting work arounds.

I'm wondering if anyone has successfully managed to figure out cache with docker-compose? for my server-side apps, I use docker-compose to orchestrate database, cache, etc. I also use docker-compose to just abstract the crazy amount command args vanilla docker can sometimes require (for volume mapping, port mapping, etc, etc -- much easier to just do docker-compose up).

my .gitlab-ci.yml looks like this: https://gitlab.com/precognition-llc/aeonvera-ui/blob/registration-rework/.gitlab-ci.yml (this is a single container compose setup -- but my .gitlab-ci.yml files are nearly identical between projects, cause I've abstracted configuration setup to docker-compose)

So, here is a prime example of what I'd like to avoid:

ANY docker image used for a project should be cached, imo (at least per-branch) Or, we should be able to specify the caching of docker images per job... so, codeclimate isn't going to change very often, so that 15+ minutes of building the image, could be reduced to MUCH less time with cache

I'm trying to get 1.13's cache-from option to work, but I don't think I fully understand how to get 1.13 to build my images. If I use image: docker:1.13-dind instead of image: gitlab/dind then I get the following error:

$ docker info
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

So it seems like the docker in docker isn't properly set up? But gitlab/dind is only 1.12 so I can't take advantage of cache-from

@nambrot How does your .gitlab-ci.yml file look? in general this should work without any problems:

build docker image:
  services:
  - docker:1.13-dind
  image: docker:1.13
  script:
  - docker pull IMAGE_NAME
  - docker build --cache-from IMAGE_NAME -t IMAGE_NAME .

Eventually you can add:

  variables:
    DOCKER_HOST: tcp://docker:2375

to job's definition.

That worked, the DOCKER_HOST was the secret!

@nambrot, do you have full caching per-branch, per-job? care to share your .gitlab-ci.yml? Any hope of this happening with docker-compose setups? (where you need a database, redis, etc?)

I currently don't have a docker-compose setup. I'm still fiddling around with the setup, but my .gitlab-ci.yml is basically what @tmaczukin wrote.

cool. I mean -- that'll help for single-image builds, for sure.

mentioned in issue #33313 (closed)

i've read a lot of docs about gitlab ci now and mainly followed this section: https://docs.gitlab.com/ee/ci/docker/using_docker_build.html#using-the-gitlab-container-registry

it looks nice, concerns cleanly separated. but performance compared to a good old ./build.sh is pretty bad unforunately. the docker pull at the beginning of every stage costs a giant amount of time.

is there really no way to cache a once-pulled docker-image for the duration of a pipeline? one docker pull would be fine but having to do a full re-pull in every stage totally kills performance for me and forced me to go back to oldschool ./build.sh that does everything in one script

and forced me to go back to oldschool ./build.sh that does everything in one script

Same.

This thread becomes quite long. Here is a summary of what I've understood so far.

Using Docker in docker

It is not possible to share the images and the cache of layers built with docker build. It would require to share /var/lib/docker which is not safe by design. (https://jpetazzo.github.io/2015/09/03/do-not-use-docker-in-docker-for-ci/)
However, it is possible to transfer the images (including images of the layers) using docker save/docker load as explained here: https://runnable.com/blog/distributing-docker-cache-across-hosts
@ayufan, What is the plan for the support of the cache in gitlab mentioned in your post one year ago ?

Using socket binding

Cache is managed by the docker running on host, meaning that it works very well for images and layers built with docker build.
The difficulty then comes when one wants to save artifacts or cache binaries between builds, since the docker commands are started from a docker container but run on the host.
How can we share volumes ? In my use case, docker run is used to run the coverage. Unfortunately, there is no simple setting to share a volume between the container issuing the docker run command, and the container created by the docker run command. The easiest way I've found so far is to run the coverage with: docker run --rm $(docker ps -a --filter "label=com.gitlab.gitlab-runner.job.sha=$CI_COMMIT_SHA" --filter "label=com.gitlab.gitlab-runner.type=cache" -q | sed "s/^/--volumes-from /"). Basically, this takes advantage of the --volumes-from option to mount the same volumes as the container gitlab-runner. Is there already a simpler way to achieve this ? If not, could something be planned ?

Is there already a simpler way to achieve this ? If not, could something be planned ?

do you think there is a way to expand on your technique for docker-compose usage?

I've been using socket binding with a local runner (https://gitlab.com/NullVoxPopuli/kubernetes-gce-gitlab-runner (doesn't actually work on kubernetes/gce)), and I haven't tried any volumes-from shenanigans -- hadn't even thought about it.

FYI, in my organisation we completly drop using socket binding, as it leverage too much issue (garbage on the host, build and layers conflicts/strange sharing as containers runs on the same host, mounting volume from local code...

We are massively using DinD, which works great with recent version of Docker and solve every socket binding issues. Main problems with DinD were performance and cache.

Performance issue was crushed using a recent enough Kernel version (4.4+) and overlay2 storage driver (both docker daemon on the runner host and dind daemon inside a gitlab runner container uses orverlay2).

For the cache between build, it was nicely solved using the recent --cache-from option of docker build. It needs to known the image to use as cache and pull them before, but it's not a big deal as it's often the same image name we build, with maybe some master or latest tags.

@orobardet anyway you can share your Dockerfile / dockre-compose.yml / config.toml ? that sounds super handy.

For the cache between build, it was nicely solved using the recent --cache-from option of docker build.

Where is this image pushed to? If you have a build step that pushes to the container registry and you use a tag like latest or a version number and a subsequent test step fails then you've just released a broken build, right?

We are doing exactly like @orobardet since January

Configured this way it takes 4 minutes to:

compile Webpack node.js/React app
build docker image and push on AWS ECR
launch deploy on kubernetes

.gitlab-ci.yml build_image: type: build_image tags: - docker - aws script: - export CI_REGISTRY_IMAGE_TAG=${CI_BUILD_TAG:-"latest"} - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY - docker pull $CI_REGISTRY_IMAGE:latest || true - docker build --cache-from $CI_REGISTRY_IMAGE:latest -t $CI_REGISTRY_IMAGE:$CI_REGISTRY_IMAGE_TAG -t $AWS_REGISTRY_IMAGE:$CI_REGISTRY_IMAGE_TAG . # Push to Gitlab Registry - docker push $CI_REGISTRY_IMAGE:$CI_REGISTRY_IMAGE_TAG # Push to AWS ECR - eval $(aws ecr get-login --no-include-email --region $AWS_DEFAULT_REGION) - docker push $AWS_REGISTRY_IMAGE:$CI_REGISTRY_IMAGE_TAG only: - master - tags

config.toml [[runners]] name = "MY_RUNNER_NAME" url = "MY_SERVER" token = "MY_TOKEN" executor = "docker" environment = ["DOCKER_DRIVER=overlay2"] [runners.docker] tls_verify = false image = "MY_DIND_IMAGE" privileged = true disable_cache = false volumes = ["/cache", "/home/gitlab-runner/.aws:/root/.aws:ro"] services = ["docker:dind"] shm_size = 0

@tvaughan It doesn't happen as the CI does not set a latest tag "automatically": we have a CI job that create that tag, but it's configured to be run manually. Once we are sure the release is valid.

Also, we are using a 2 registry strategy: the internal Gitlab registry is used for developpement and pre-build process only, and so can have something broken in this registry. But our "production" (let's say stable) registry is an external one. And we only pushed release on this registry once all tests are ok. In case of automatic testing, this "push to prod registry" job is in a later stage, launched manually. In case of project needing external deployment and testing, this job is a "manual" job.

Not that I think this can also be done only with a single registry: you just have to define 2 different pathes within the registry to isolate "dev/build" images from "prod" images.

Here is a typical CI workflow (that can be simplfied for small projects):

Development branches (i.e. non-master and non-tag push) have a set of jobs build the image with the branch name as a tag, pushing it to the gitlab registry, and running some other jobs for test and other. If an image is bad (detected by the pipeline or manually once deployed to test environnement), it does not matter, it's a developpement build, on the internal "developpement" gitlab registry with a specific tag.
Once the branch is merged into master, we have more or less the same pipeline (plus some more specific time consuming jobs). In the same way, if the image build is bad, it's only in the internal registry with a "master" tag. At worst, testing team have a broken staging environment. That's why staging exists.
Once the master is eligible for a new version:
- We create a tag. Specific jobs with only: tags triggers and build an image, tagged with the version number, and pushed in the gitlab internal registry too.
- Once the whole pipeline succeed, a latest job push the version image to the "production" registry. This job can, in some important/complex project, be set as manual
- At this step, there is a new build release in registry production, but not tagged at latest. It may be broken, but the propability is low as this time, and when this happens, it's not critical as this is not a latest image.
- Once the version is validated, we just manually launch a final job that just tag this release image as latest in our production registry.

All of this makes a lot of circuit-breaker steps.

Follows examples of runner config and a project CI, as also requested by @NullVoxPopuli.

[[runners]]
  name = "MY_RUNNER_NAME"
  url = "MY_GITLAB_URL"
  token = "MY_TOKEN"
  executor = "docker"
  environment = ["DOCKER_DRIVER=overlay2"]
  [runners.docker]
    tls_verify = false
    image = "ubuntu:16.04"
    privileged = true
    disable_cache = false
    volumes = ["/var/lib/gitlab-runner/cache/runners:/cache"]
    cache_dir = "/cache"

The .gitlab-ci.yml is a simplified version of the above workflow (but still having 2 registry step, a job triggers by tag that push to the prod registry, and a manual job to set it as latest.

image: docker:dind

stages:
 - build
 - test
 - release
 - latest

variables:
  PROD_REGISTRY_IMAGE: production-grade.docker.registry/group/$CI_PROJECT_NAME

Docker master:
  stage: build
  script: |
    docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY
    docker pull $CI_REGISTRY_IMAGE:$CI_BUILD_REF_NAME || true
    docker pull $PROD_REGISTRY_IMAGE || true
    docker build -t $CI_REGISTRY_IMAGE:$CI_BUILD_REF_NAME --cache-from $CI_REGISTRY_IMAGE:$CI_BUILD_REF_NAME --cache-from $PROD_REGISTRY_IMAGE .
    docker push $CI_REGISTRY_IMAGE:$CI_BUILD_REF_NAME
  except:
    - tags
  only:
    - master

Docker release:
  stage: build
  environment: artifactory
  script: |
    docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY
    docker login -u $PROD_API_USER -p $PROD_API_KEY ${PROD_REGISTRY_IMAGE%%/*}
    export IMAGE_VERSION=${CI_COMMIT_TAG#v}
    docker pull $CI_REGISTRY_IMAGE:master || true
    docker pull $PROD_REGISTRY_IMAGE:$IMAGE_VERSION || true
    docker pull $PROD_REGISTRY_IMAGE || true
    docker build -t $PROD_REGISTRY_IMAGE:$IMAGE_VERSION --cache-from $CI_REGISTRY_IMAGE:master --cache-from $PROD_REGISTRY_IMAGE:$IMAGE_VERSION --cache-from $PROD_REGISTRY_IMAGE .
    docker push $PROD_REGISTRY_IMAGE:$IMAGE_VERSION
  only:
    - tags

Docker latest:
  stage: latest
  environment: production
  script: |
    docker login -u $PROD_API_USER -p $PROD_API_KEY ${PROD_REGISTRY_IMAGE%%/*}
    export IMAGE_VERSION=${CI_BUILD_TAG#v}
    docker pull $PROD_REGISTRY_IMAGE:$IMAGE_VERSION
    docker tag $PROD_REGISTRY_IMAGE:$IMAGE_VERSION $PROD_REGISTRY_IMAGE:latest
    docker push $PROD_REGISTRY_IMAGE:latest
  only:
    - tags
  when: manual

Thanks @orobardet. I understand how you're using --cache-from now. By the looks of things it seems like --cache-from helps you deploy images after they've completed a thorough series of tests. I'm still in search of a way to cache between steps before an image is pushed. https://gitlab.com/gitlab-org/gitlab-ce/issues/17861#note_37212816

Thanks for pointing this out: Using --cache-from requires an extra copy of the image (compared to the socket binding), but it makes things much faster than before (without --cache-from).

@orobardet: With a similar config to yours, I get error like 'Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?'. I understand that it is due to docker:dind image being started with arguments, preventing docker daemon to start. A similar configuration than yours with docker:dind configured as a service works fine but it then makes things difficult when it comes to configuring volumes to manage artifacts.

I've attempted a similar approach than @nkovacs: docker-builder is used as base image. With this image based on docker:dind, the docker daemon is started in background. Unfortunately, I get the same kind of errors and struggle to explain why.

concurrent = 1
check_interval = 0

[[runners]]
  name = "runner1"
  url = "https://xxxxxxxxxx"
  token = "xxxxxxxxx"
  executor = "docker"
  environment = ["DOCKER_DRIVER=overlay2"]
  [runners.docker]
    tls_verify = false
    image = "docker-builder"
    privileged = true
    disable_cache = false
    volumes = ["/cache"]
  [runners.cache]
    Insecure = false

before_script:
  - ps -eaf
  - docker ps -a
  - docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY

build-master:
  stage: build
  script:
    - docker pull "$CI_REGISTRY_IMAGE:latest" || true
    - docker build --cache-from "$CI_REGISTRY_IMAGE:latest" --pull -t "$CI_REGISTRY_IMAGE:latest" .
    - docker push "$CI_REGISTRY_IMAGE:latest"
  only:
    - master

Here is the log:

Running with gitlab-ci-multi-runner 9.5.0 (413da38)
  on runner1 (c3940a65)
Using Docker executor with image docker-builder ...
Using docker image sha256:fe3f60773b8f08e1c7548f7e9eff2142cb660989a9bcb05031fc7d89c2dea15b for predefined container...
Pulling docker image docker-builder ...
Using docker image docker-builder ID=sha256:fd4e0caf306d03b7fe2ae350a3d3dc03519b93ff5e4e1ce49fff04f873a7ab9a for build container...
Running on runner-c3940a65-project-19-concurrent-0 via 2c5d18be3fc8...
Fetching changes...
HEAD is now at 851c01d Update .gitlab-ci.yml - Grr
From https://xxxxxxxxxxxxxx/docker-compose
   851c01d..a2a6da1  master     -> origin/master
Checking out a2a6da1a as master...
Skipping Git submodules setup
$ ps -eaf
PID   USER     TIME   COMMAND
    1 root       0:00 /bin/sh
    8 root       0:00 docker daemon --host=unix:///var/run/docker.sock --host=tcp://0.0.0.0:2375 --storage-driver=vfs
   26 root       0:00 docker-containerd -l /var/run/docker/libcontainerd/docker-containerd.sock --runtime docker-runc --start-timeout 2m
   45 root       0:00 /bin/sh
   47 root       0:00 ps -eaf
$ docker ps -a
Cannot connect to the Docker daemon. Is the docker daemon running on this host?
ERROR: Job failed: exit code 1

@lionelperrin I notice your docker daemon was started using VFS storage driver, despite the DOCKER_DRIVER=overlay2 in the runner config. It may be because overlay2 is not possible in your dind setup or, somethind funny is happenning and overriding the DOCKER_DRIVER variable.

In this second case, the DOCKER_HOST variable may be incorrectly set by something. Try at the begining of your job to echo these 2 variables (DOCKER_DRIVER and DOCKER_HOST), and even force the requesting of the local daemon by with DOCKER_HOST=unix:///var/run/docker.sock docker info

Thanks. I now have a working configuration using dind. My problems were due to my docker-builder image being based on a too old version of docker:dind. I've reduced to the minimum the scripts provided by @nkovacs (many thanks for your idea) and it now works like a charm, even with volumes such as in this example:

image: docker-builder

test:
  stage: test
  script:
    - docker pull "$CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG" || true
    - docker build --cache-from "$CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG" --pull -t "$CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG" .
    - docker run --rm -e RAILS_ENV=test -v $(pwd)/coverage:/usr/src/app/coverage webapp bundle exec rake db:setup simplecov
  artifacts:
    name: coverage-$CI_COMMIT_REF_NAME
    paths: 
    - coverage/
    expire_in: 1 week

I share here how I build the docker-builder image:

Dockerfile

FROM docker:dind
COPY builder-entrypoint.sh /usr/local/bin/
ENTRYPOINT ["builder-entrypoint.sh"]
CMD []

builder-entrypoint.sh

#!/bin/sh
# start daemon
dockerd-entrypoint.sh > /dev/null 2>&1 &
# wait for daemon to start
while ! docker info > /dev/null 2>&1; do sleep 1; done
# execute command
exec "$@"

By the way, it looks to me that as long as concurrent=1 in config.toml, it should be safe to persist docker storage with something like in the example below. This saves time for the docker pull commands. Am I missing something ?

concurrent = 1
check_interval = 0

[[runners]]
  name = "runner1"
  url = "https://xxxxxx"
  token = "xx"
  executor = "docker"
  environment = ["DOCKER_DRIVER=overlay2"]
  [runners.docker]
    tls_verify = false
    image = "docker-builder"
    privileged = true
    disable_cache = false
    volumes = ["runner1_storage:/var/lib/docker, "/cache"]
  [runners.cache]
    Insecure = false

any ideas how to extrapolate this to docker-compose?

I use docker-compose for tests, so I don't need to configure database / cache stuff independently of dev environment. (I like having dev + ci being as close as possible)

@NullVoxPopuli docker-compose supports this option: https://github.com/docker/compose/pull/4514/files

is there a way to do this but on the shared runners? i have no way to configure the cache

Caching for docker-in-docker builds

Designs

Child items ...

Activity

Preface

Using shell executor

Using docker executor with bind mounted socket

Using docker executor with dind

Using Docker in docker

Using socket binding

Caching for docker-in-docker builds

Relates to

Activity

Preface

Using shell executor

Using docker executor with bind mounted socket

Using docker executor with dind

Using Docker in docker

Using socket binding