Improve performance of Auto DevOps jobs that use Docker-in-Docker
Problem to solve
Many Auto DevOps jobs use Docker-in-Docker, pulling images and running containers to implement functionality within the job scripts. This was a concious decision to work around an underlying problem in GitLab CI.
Here is an excerpt from the license_management job as an example.
docker run --volume "$PWD:/code" \
"registry.gitlab.com/gitlab-org/security-products/license-management:$LICENSE_MANAGEMENT_VERSION" analyze /code
Implementing CI jobs this way is costly in terms of network bandwidth and the time needed to run jobs. As a self-hosted customer, pulling Docker images on every single run of the Auto DevOps jobs makes it unusable.
Further details
GitLab CI jobs can specify a custom Docker image to use via the image parameter. So, the docker run … snippet above could potentially be written like so.
license_management:
image:
name: registry.gitlab.com/gitlab-org/security-products/license-management:$LICENSE_MANAGEMENT_VERSION
entrypoint: [""]
script:
- analyze $PWD
There are many advantages to writing jobs this way.
- Better use of GitLab CI features, and easier for jobs to be customized.
- No need to run Docker-in-Docker. The CI runner directly runs a container for registry.gitlab.com/gitlab-org/security-products/license-management.
- Docker images can be cached by CI runners, or pulled from registry mirrors. With Docker-in-Docker, images must be pulled anew with every run of the job, wasting network bandwidth and time.
- Jobs run faster due to the potential for image caching, and by eliminating a layer of indirection.
However, not all CI jobs can be run this way. For example, sitespeedio/sitespeed.io fails because set -o pipefail is not supported under dash. See gitlab-ce#41809.
Proposal
I propose a multi-step solution.
Step 1: Allow a custom registry to be used for GitLab Security Products images.
In my GitLab installation, I've created a project that mirrors the GitLab Security Products images. And, the jobs that use those images have been modified with a variable called GITLAB_SECURITY_PRODUCTS_REGISTRY that can be set to a custom registry. For license_management, it looks like this.
variables:
# GITLAB_SECURITY_PRODUCTS_REGISTRY: registry.gitlab.com/gitlab-org/security-products
GITLAB_SECURITY_PRODUCTS_REGISTRY: registry-dev.transzap.com/devops/images/security-products
docker run --volume "$PWD:/code" \
"$GITLAB_SECURITY_PRODUCTS_REGISTRY/license-management:$LICENSE_MANAGEMENT_VERSION" analyze /code
That way, at least the Auto DevOps jobs for the security products can pull images from a local registry intead of over the internet from registry.gitlab.com every single time.
Step 2: Add support for ash, dash, and other shells in GitLab CI
I'm not totally sure about the state of this problem in GitLab CI. But, for images like sitespeed/sitespeed.io to work, GitLab CI needs to be able to work with shells that don't support bashisms.
There was an attempt at this in gitlab-runner!309 (closed).
Step 3: Restructure Auto DevOps jobs to not use Docker-in-Docker
Assuming issues in GitLab CI are resolved, rewrite the Auto DevOps jobs to run job images directly, instead of using Docker-in-Docker.
What does success look like, and how can we measure that?
Auto DevOps jobs should complete quickly. In my modified pipeline, just implementing step 1 has cut the time needed to run the security jobs down to about a minute each. Otherwise, jobs like license_management job spend 4 or 5 minutes just pulling the image. The code_quality job is particularly bad.
I'll try to gather some actual numbers later.
Links / references
- Relates to gitlab-ce#49395
- Remove Use Of docker run from Auto DevOps template to speed it up with caching
- Reduce license_management image weight
- Auto DevOps performance docker image not compatible with gitlab-runner
- (Alternate) GitLab CI Performance Job
- WIP: Add a POSIX shell implementation, use it for the "sh" shell executor
- Replace shell generators with commands in the gitlab-runner[-helper]