I've followed the advanced example in the container registry announcement to set up a CI build of a docker image.
Unfortunately the image takes a very long time to build, and I cannot enable caching, since /var/lib/docker is in the linked docker:dind service image, not in the build image.
The release-image job is also missing a docker pull step. That job gets a fresh docker:dind service, so it doesn't have the newly built image. The same is probably also true of the two test jobs.
I'm not quite sure I understand. Are you rebuilding the image in multiple steps? I see the container registry as acting as your cache in this instance, so you'd build once and push to the registry so that you don't have to build it again in a future step.
Oh wait, maybe you mean it takes a long time to build on each commit. That's worth looking into further.
Thanks for mentioning the missing step in the example. I've pushed an update to the blog post. The test jobs should actually work as-is because docker run will download the image from the registry if it doesn't exist locally.
It takes a long time (40-50 minutes) to build a single commit. The Dockerfile starts with FROM php:5.6-apache, which it has to download every time (484 megabytes), then it compiles a few php extensions, installs node and npm, git, composer, phantomjs and selenium.
The image is meant for testing php apps, so it's pretty heavy. For example, it needs selenium because the acceptance tests need to access the website that's running in the build image, and I don't think that's possible to do with the current service linking support in gitlab-ci-multi-runner (the link is one-way).
You're right about the test jobs. I didn't try those.
New theory: Missing the intermediate layers on your local docker image means you're rebuilding everything from scratch every time. If you were to pull a previously built image first, then rebuild, you might be able to skip rebuilding the unchanged intermediate layers.
Using Docker compose may eventually help you by being able to break out selenium to a separate container.
Starting to feel like we need a (cached) docker daemon per project or possibly per group or something.
I tried pulling a previous image, but docker still rebuilt everything from scratch . The only difference was that it didn't have to download the php image from docker hub.
I just pulled the image on a computer where I never built that image before. The first build was from scratch. Subsequent builds are cached, and finish instantly.
So it looks like the caching feature depends on some files that are local to the machine you're building on.
I have the same problem with the cache. Everytime I execute the build, the runner downloads everything and it's not cached on my local machine. This means that when I execute docker images on my machine (where the runner is running) I can not see the image that is created for the build.
A solution may be not to run image build using dind (which means each local docker-daemon is new and without any previous data), but run it mounting the host docker.sock.
@orobardet Yes, that might work, and we now document that option, but there are some serious drawbacks which stop us from using it for our shared runners on GitLab.com as discussed on #17769 (closed). So we still need to look at caching for docker-in-docker.
The cache of docker layers when using the socket bind-mounting solution is not working.
I run the gitlab-ci-multi-runner alongside gitlab-ee, and bind-mount the docker socket as describe in this issue.
Images are correctly accessed (i.e: "FROM" is instantaneous, and push/pull only pushes/pulls the missing layers),
but when building an image, every layer gets built every time.
I am looking for the same thing: how to cache the dind builds. I have tested the suggestion above, about including the Docker socket bind, without success.
Using Jenkins with dind and bind-mouting socket, the build takes 60 seconds. For the same changes, using Gitlab runner with dind, it takes 30 min.
We offer two git strategies: git fetch and git clone. git clone always checkouts project from scratch removing all current git data if present.
This may collide with caching of docker layers if you ADD/COPY the files to docker image.
Let's consider this example:
FROM ubuntuRUN apt-get install -y git-coreADD / /appRUN /app/setup
If you use git clone the layers after ADD will never be cached, even if files did not change.
If you use git fetch and project has pas git data it is possible that ADD layers will be cached and everything executed after them.
It's important to note that git fetch caching works only locally (the git data is stored locally), it doesn't work on automatically managed infrastracture.
Using shell executor
This is the best executor as for now to be used when you wan't to cache docker layers.
It basically doesn't require any changes, other then adding gitlab-runner to docker group.
Given the git fetch it gives the best possibility of caching docker layers.
Using docker executor with bind mounted socket
The widely evaluated proposal is to expose docker engine to child containers: volumes = ["/var/run/docker.sock:/var/run/docker.sock"].
This also allows fairly good caching. The docker executor also supports git fetch strategy and allows to cache layers between.
Using bind mounted socket it can lead to concurrency problems when running multiple builds on single machine at the same time,
because you are using the same docker engine. It can lead to name clashing.
Using docker executor with dind
services: - docker:dind
This approach doesn't have caching enabled by default. It's important to note that docker:dind by default uses vfs storage driver which offers best compatibility, but also is the slowest of available.
To make docker:dind reasonably you have to add DOCKER_DRIVER:
This will make Docker (a dind instance) to use overlayfs which is one of the fastest docker storage driver.
It will have big impact on performance when building docker images.
The docker:dind is the best solution in terms of conccurency, because every build has its own docker engine so it does not affect other running instances.
This comes at cost of possibly longer build times.
Some way of layer caching can be achieved by pulling previously uploaded image to reause layers that do not depent on content from git registry:
In the near future we will extend dind with local, per-build cache to speed-up image building.
Thanks for offering all the tips on optimizing builds!
On the topic of the storage driver, the Docker documentation states: "Many people consider OverlayFS as the future of the Docker storage driver. However, it is less mature, and potentially less stable than some of the more mature drivers such as aufs and devicemapper. For this reason, you should use the OverlayFS driver with caution and expect to encounter more bugs and nuances than if you were using a more mature driver."
Because of this, I'm a bit hesitant to choose overlay. Have you encountered any of the mentioned stability issues?
We are using overlayfs for quite long time with dind and it works much better then aufs. I would not suggest using that for production servers (running applications), but for CI/CD solution we didn't encounter any problems.
I was using aufs in docker:dind, and just tried overlayfs has you suggested @ayufan: it halves the build time compared to aufs! Despite no caching for now for docker build -- looking forward to your solution about this (by the way, any clues on how you we'll do that, just to try it ourselves?).
Thanks!
Thanks also for the DOCKER_DRIVER variable: I didn't know about it (can't find it on any docker documentation). I don't need anymore to extends docker:dind image to set the storage driver :)
For those who want to try docker:dind gitlab-ci build with overlayfs, don't forget to install and load overlay kernel module on the host of the runner.
gitlab-ci-multi-runner 1.3.2 (0323456)Using Docker executor with image maven:3.3.3 ...Pulling docker image docker:dind ...Starting service docker:dind ...Waiting for services to be up and running...*** WARNING: Service runner-f088f62c-project-1358215-concurrent-0-docker probably didn't start properly.service runner-f088f62c-project-1358215-concurrent-0-docker did timeout2016-07-12T10:53:18.317354705Z time="2016-07-12T10:53:18.316825632Z" level=warning msg="/!\\ DON'T BIND ON ANY IP ADDRESS WITHOUT setting -tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING /!\\" 2016-07-12T10:53:18.337413341Z time="2016-07-12T10:53:18.335061756Z" level=info msg="New containerd process, pid: 17\n" 2016-07-12T10:53:19.345445073Z time="2016-07-12T10:53:19.344852419Z" level=error msg="'overlay' not found as a supported filesystem on this host. Please ensure kernel is new enough and has overlay support loaded." 2016-07-12T10:53:19.345515108Z time="2016-07-12T10:53:19.345088407Z" level=fatal msg="Error starting daemon: error initializing graphdriver: driver not supported"*********Pulling docker image maven:3.3.3 ...Running on runner-f088f62c-project-1358215-concurrent-0 via ec04b2394049...Fetching changes...HEAD is now at e806ae7 Update .gitlab-ci.ymlChecking out e806ae72 as master...$ docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN registry.gitlab.com/bin/bash: line 23: docker: command not foundERROR: Build failed: exit code 1
I guess that you should either use docker:dind with DOCKER_DRIVER or use /var/run/docker.sock:/var/run/docker.sock. This are two exclusive methods, there's no point of using both of them :)
Sorry if I'm late to the party, but if I understand correctly, for someone using the default Gitlab.com shared runners, I essentially have to use dind, but that also prevents me from caching intermediate layers, which may come in a future release?
Using the shared runners on GitLab.com, you have to use dind, and that prevents you from using Docker's built in caching mechanisms. We have not yet found a way to work around that.
Actually, you can use an image with the docker daemon in it instead of having it as a service, and then you can cache /var/lib/docker. Just create a new image FROM docker:dind and in your entrypoint, start dockerd-entrypoint.sh in the background and a shell in the foreground for the runner.
The runner won't cache /var/lib/docker because it's outside the build directory. It's also a volume in the docker:dind image, so I duplicated docker:dind to get rid of the volume, but in the end I didn't need that (I wasn't able to symlink it as a volume). The entrypoint of my custom image is this: https://gitlab.com/nkovacs/docker-builder/blob/master/builder-entrypoint.sh
This looks awesome! I would definitely recommend to put this somewhere in the docs, it is really helpful and I imagine a showstopper to a lot of other folks who are evaluating moving their CI to Gitlab
I do notice that the output omits creating the .cache dir:
$ mkdir -pv `pwd`/.cache/docker$ mkdir -pv /var/lib/dockercreated directory: '/var/lib/docker'$ cp -a `pwd`/.cache/docker/* /var/lib/docker || :cp: can't stat '/path/to/pwd/.cache/docker/*': No such file or directory
Should I try -f or some other flag?
Edit:
I forgot to mention that, when the after_script runs, I get this:
Running after script...$ cp -a /var/lib/docker/* `pwd`/.cache/docker/ || :$ docker imagescp: can't stat '/var/lib/docker/*': No such file or directory
For context, here's what my after_script looks like for the job:
after_script: #reset global after_script - "cp -a /var/lib/docker/* `pwd`/.cache/docker/ || :" - "docker images"
From the previous log, we know that /var/lib/docker already exists.
I couldn't find anything about 'stat' before because I was being very specific... I found a source that dealt with a slightly similar problem, and their solution was to wait or sleep for a bit. Since I'm not that much of a UNIX expert (and don't have much time to allot to the problem at hand), I used sleep 3 to wait for the completion of the creation of the directories.
I still have the same error message, though:
...$ mkdir -pv `pwd`/.cache/dockercreated directory: '/path/from/pwd/.cache/'created directory: '/path/from/pwd/.cache/docker'$ sleep 3$ mkdir -pv /var/lib/dockercreated directory: '/var/lib/docker'$ cp -a `pwd`/.cache/docker/* /var/lib/docker || :cp: can't stat '/path/from/pwd/.cache/docker/*': No such file or directory$ docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $GLLogin Succeeded...
Also, I tried different flags, like -a, -r, ... though I've yet to try to force with -rf.
A simple way to handle it, at least for runners close to or inside the gitlab-ce node is to enable Docker registry's proxy cache! This would indeed only work for users using a private registry, but would still be helpful to many. See https://blog.docker.com/2015/10/registry-proxy-cache-docker-open-source/
The problem though is that there is no way (documented at least) to configure the registry coming with Omnibus... I posted an issue about it at omnibus-gitlab#1655 (closed)
@gajus We will able to docker pull foo && docker build --cache-from foo bar. It will be easier to handle cache since it will not require a intermediate cache store and dealing with docker save/load. That's what I hope.... I didn't try docker 1.13 yet.
@grzesiek the --cache-from is possibly "smarter" if it could do pull itself; i.e it can decide which images to pull. but current implementation has no difference, you have to pull each --cache-from argument so docker build could use them.
however, it's cleaner solution, just add 1+N pull arguments and 1+N --cache-from arguments, no need for extra scripts
@grzesiek Yes, and it was a nasty workaround for this problem . When 1.13 will be out we need to test --cache-from and switch to the native solution :)
Hi i'm having a big issue with this workflow, that i don't know how to workaround.
I needed to setup a CI with 3 jobs (compile code, build image, deploy image), after reading this issue i chose to go with the docker.sock mount option (to temporarily solve the caching issue).
This is my configuration:
The fact is that i noticed that my artifacts aren't copied inside the docker image, i think it's because the docker build is executed on the host machine and doesn't has access to the artifacts folder, am I right? Could you think to a way to workaround this?
To speed things up a bit more, I've set cache.untracked: true in .gitlab-ci.yml:
cache: untracked: true
... in combination with something like this in my build script:
# let's save the "vendor" directory to the Gitlab CI cachedocker create --name cachecontainer my-imagedocker cp cachecontainer:/var/www/vendor $CI_PROJECT_DIR/vendordocker rm cachecontainer
I had good success with the following .gitlab-ci.yml cutting down build time of a simple image from ~3.5 mins to 1 min:
Creates a dir that will be cached
Logs in to Gitlab registry
Tries to load the image from the local cache folder, if it can't find the file it will pull it from the registry (assuming downloading the image will be faster than building it fully ;) )
@gajus you have typo in that blog: -tag used twice, but you intended to have two --cache-from. also for me default driver is aufs2 not vfs, so it depends really which kernel is available.
and you probably should link back to this issue :) https://gitlab.com/gitlab-org/gitlab-ce/issues/17861#note_19140733
I think --cache-from solves this issue. We're not using it though because our builds are now sufficiently fast with devicemapper and not using a cache allows us to rerun the build to get updates without having to change anything in the dockerfile.
@nkovacs Beware of devicemapper on heavy docker use, like with runners: you may run out of inode. It depends of your kernel version, but if you have at least a 4.x, it would be better to use overlay2 on overlay2. This will result in a slilghtly better performance, et no inode issues in our case for monthes.
Beside the right choice of storage drivers, using --cache-from can reduce build time. As an exemple, a job in vfs that take ~15 min to build:
vfs on aufs/devmapper ->overlay on overlay/aufs ==> reduced to 5/6 mins
using --cache-from ==> reduced to ~2 min and even a few seconds depending on you Dockerfile the layers that changed
Yes its disabled on overlay. overlay or overlay2 on overlay does not works. But overlay2 on overlay2 works :) May depends of the exact version of the kernel, though. We are using 4.4 on our runner servers.
In the worst case, with a kernel 3.x (>= 3.18), aufs on overlay is a better choice than overlay on devicemappper. Same performance, but much more inode-friendly for your filesystem.
I tried it on kernel 4.8, and overlay on overlay worked, as well as overlay2 on overlay.
Right now we're using aufs on devicemapper (direct-lvm), because we didn't see any performance improvements with overlay/overlay2 on devicemapper. Our kernel is 4.9.
What is this inode problem? I know overlay has an inode exhaustion problem (which is fixed in overlay2), but I can't find anything about devicemapper. Or is it just when using overlay on devicemapper?
This assumes each build is the same. That's almost never true in my
case. What I could do is specify --cache-from=ubuntu. That would help,
but not a lot (percentage-wise).
@SharpEdgeMarshall This is strange. Is your previous image $CI_REGISTRY_IMAGE:latest using the same FROM (node:6.10-alpine) as the one you are building?
If so, docker should not have to repull every layers. In the FROM step, docker should try to pull de FROM image from the hub, but quickly print "already exists" for each layer.
@orobardet Yes the FROM image is the same since many builds ago... And I can confirm is the same because the cache on the successive steps is not invalidated.
Hey everyone, I've read all your creative solutions to getting docker going with cache in gitlab's hosted-ci. Interesting work arounds.
I'm wondering if anyone has successfully managed to figure out cache with docker-compose? for my server-side apps, I use docker-compose to orchestrate database, cache, etc.
I also use docker-compose to just abstract the crazy amount command args vanilla docker can sometimes require (for volume mapping, port mapping, etc, etc -- much easier to just do docker-compose up).
So, here is a prime example of what I'd like to avoid:
ANY docker image used for a project should be cached, imo (at least per-branch)
Or, we should be able to specify the caching of docker images per job... so, codeclimate isn't going to change very often, so that 15+ minutes of building the image, could be reduced to MUCH less time with cache
I'm trying to get 1.13's cache-from option to work, but I don't think I fully understand how to get 1.13 to build my images. If I use image: docker:1.13-dind instead of image: gitlab/dind then I get the following error:
$ docker infoCannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
So it seems like the docker in docker isn't properly set up? But gitlab/dind is only 1.12 so I can't take advantage of cache-from
@nambrot, do you have full caching per-branch, per-job? care to share your .gitlab-ci.yml?
Any hope of this happening with docker-compose setups? (where you need a database, redis, etc?)
it looks nice, concerns cleanly separated. but performance compared to a good old ./build.sh is pretty bad unforunately. the docker pull at the beginning of every stage costs a giant amount of time.
is there really no way to cache a once-pulled docker-image for the duration of a pipeline? onedocker pull would be fine but having to do a full re-pull in every stage totally kills performance for me and forced me to go back to oldschool ./build.sh that does everything in one script
Cache is managed by the docker running on host, meaning that it works very well for images and layers built with docker build.
The difficulty then comes when one wants to save artifacts or cache binaries between builds, since the docker commands are started from a docker container but run on the host.
How can we share volumes ? In my use case, docker run is used to run the coverage. Unfortunately, there is no simple setting to share a volume between the container issuing the docker run command, and the container created by the docker run command. The easiest way I've found so far is to run the coverage with: docker run --rm $(docker ps -a --filter "label=com.gitlab.gitlab-runner.job.sha=$CI_COMMIT_SHA" --filter "label=com.gitlab.gitlab-runner.type=cache" -q | sed "s/^/--volumes-from /"). Basically, this takes advantage of the --volumes-from option to mount the same volumes as the container gitlab-runner. Is there already a simpler way to achieve this ? If not, could something be planned ?
FYI, in my organisation we completly drop using socket binding, as it leverage too much issue (garbage on the host, build and layers conflicts/strange sharing as containers runs on the same host, mounting volume from local code...
We are massively using DinD, which works great with recent version of Docker and solve every socket binding issues. Main problems with DinD were performance and cache.
Performance issue was crushed using a recent enough Kernel version (4.4+) and overlay2 storage driver (both docker daemon on the runner host and dind daemon inside a gitlab runner container uses orverlay2).
For the cache between build, it was nicely solved using the recent --cache-from option of docker build. It needs to known the image to use as cache and pull them before, but it's not a big deal as it's often the same image name we build, with maybe some master or latest tags.
For the cache between build, it was nicely solved using the recent --cache-from option of docker build.
Where is this image pushed to? If you have a build step that pushes to the container registry and you use a tag like latest or a version number and a subsequent test step fails then you've just released a broken build, right?
@tvaughan It doesn't happen as the CI does not set a latest tag "automatically": we have a CI job that create that tag, but it's configured to be run manually. Once we are sure the release is valid.
Also, we are using a 2 registry strategy: the internal Gitlab registry is used for developpement and pre-build process only, and so can have something broken in this registry. But our "production" (let's say stable) registry is an external one. And we only pushed release on this registry once all tests are ok. In case of automatic testing, this "push to prod registry" job is in a later stage, launched manually. In case of project needing external deployment and testing, this job is a "manual" job.
Not that I think this can also be done only with a single registry: you just have to define 2 different pathes within the registry to isolate "dev/build" images from "prod" images.
Here is a typical CI workflow (that can be simplfied for small projects):
Development branches (i.e. non-master and non-tag push) have a set of jobs build the image with the branch name as a tag, pushing it to the gitlab registry, and running some other jobs for test and other. If an image is bad (detected by the pipeline or manually once deployed to test environnement), it does not matter, it's a developpement build, on the internal "developpement" gitlab registry with a specific tag.
Once the branch is merged into master, we have more or less the same pipeline (plus some more specific time consuming jobs). In the same way, if the image build is bad, it's only in the internal registry with a "master" tag. At worst, testing team have a broken staging environment. That's why staging exists.
Once the master is eligible for a new version:
We create a tag. Specific jobs with only: tags triggers and build an image, tagged with the version number, and pushed in the gitlab internal registry too.
Once the whole pipeline succeed, a latest job push the version image to the "production" registry. This job can, in some important/complex project, be set as manual
At this step, there is a new build release in registry production, but not tagged at latest. It may be broken, but the propability is low as this time, and when this happens, it's not critical as this is not a latest image.
Once the version is validated, we just manually launch a final job that just tag this release image as latest in our production registry.
All of this makes a lot of circuit-breaker steps.
Follows examples of runner config and a project CI, as also requested by @NullVoxPopuli.
The .gitlab-ci.yml is a simplified version of the above workflow (but still having 2 registry step, a job triggers by tag that push to the prod registry, and a manual job to set it as latest.
Thanks @orobardet. I understand how you're using --cache-from now. By the looks of things it seems like --cache-from helps you deploy images after they've completed a thorough series of tests. I'm still in search of a way to cache between steps before an image is pushed. https://gitlab.com/gitlab-org/gitlab-ce/issues/17861#note_37212816
Thanks for pointing this out: Using --cache-from requires an extra copy of the image (compared to the socket binding), but it makes things much faster than before (without --cache-from).
@orobardet: With a similar config to yours, I get error like 'Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?'. I understand that it is due to docker:dind image being started with arguments, preventing docker daemon to start. A similar configuration than yours with docker:dind configured as a service works fine but it then makes things difficult when it comes to configuring volumes to manage artifacts.
I've attempted a similar approach than @nkovacs: docker-builder is used as base image. With this image based on docker:dind, the docker daemon is started in background. Unfortunately, I get the same kind of errors and struggle to explain why.
@lionelperrin I notice your docker daemon was started using VFS storage driver, despite the DOCKER_DRIVER=overlay2 in the runner config. It may be because overlay2 is not possible in your dind setup or, somethind funny is happenning and overriding the DOCKER_DRIVER variable.
In this second case, the DOCKER_HOST variable may be incorrectly set by something. Try at the begining of your job to echo these 2 variables (DOCKER_DRIVER and DOCKER_HOST), and even force the requesting of the local daemon by with DOCKER_HOST=unix:///var/run/docker.sock docker info
Thanks. I now have a working configuration using dind. My problems were due to my docker-builder image being based on a too old version of docker:dind. I've reduced to the minimum the scripts provided by @nkovacs (many thanks for your idea) and it now works like a charm, even with volumes such as in this example:
I share here how I build the docker-builder image:
Dockerfile
FROM docker:dindCOPY builder-entrypoint.sh /usr/local/bin/ENTRYPOINT ["builder-entrypoint.sh"]CMD []
builder-entrypoint.sh
#!/bin/sh# start daemondockerd-entrypoint.sh > /dev/null 2>&1 &# wait for daemon to startwhile! docker info > /dev/null 2>&1;do sleep 1;done# execute commandexec"$@"
By the way, it looks to me that as long as concurrent=1 in config.toml, it should be safe to persist docker storage with something like in the example below. This saves time for the docker pull commands. Am I missing something ?
any ideas how to extrapolate this to docker-compose?
I use docker-compose for tests, so I don't need to configure database / cache stuff independently of dev environment. (I like having dev + ci being as close as possible)