Improve performance of Auto DevOps jobs that use Docker-in-Docker

Problem to solve

Many Auto DevOps jobs use Docker-in-Docker, pulling images and running containers to implement functionality within the job scripts. This was a concious decision to work around an underlying problem in GitLab CI.

Here is an excerpt from the license_management job as an example.

docker run --volume "$PWD:/code" \
  "registry.gitlab.com/gitlab-org/security-products/license-management:$LICENSE_MANAGEMENT_VERSION" analyze /code

Implementing CI jobs this way is costly in terms of network bandwidth and the time needed to run jobs. As a self-hosted customer, pulling Docker images on every single run of the Auto DevOps jobs makes it unusable.

Further details

GitLab CI jobs can specify a custom Docker image to use via the image parameter. So, the docker run … snippet above could potentially be written like so.

license_management:
  image:
    name: registry.gitlab.com/gitlab-org/security-products/license-management:$LICENSE_MANAGEMENT_VERSION
    entrypoint: [""]
  script:
    - analyze $PWD

There are many advantages to writing jobs this way.

  • Better use of GitLab CI features, and easier for jobs to be customized.
  • No need to run Docker-in-Docker. The CI runner directly runs a container for registry.gitlab.com/gitlab-org/security-products/license-management.
  • Docker images can be cached by CI runners, or pulled from registry mirrors. With Docker-in-Docker, images must be pulled anew with every run of the job, wasting network bandwidth and time.
  • Jobs run faster due to the potential for image caching, and by eliminating a layer of indirection.

However, not all CI jobs can be run this way. For example, sitespeedio/sitespeed.io fails because set -o pipefail is not supported under dash. See gitlab-ce#41809.

Proposal

I propose a multi-step solution.

Step 1: Allow a custom registry to be used for GitLab Security Products images.

In my GitLab installation, I've created a project that mirrors the GitLab Security Products images. And, the jobs that use those images have been modified with a variable called GITLAB_SECURITY_PRODUCTS_REGISTRY that can be set to a custom registry. For license_management, it looks like this.

variables:
  # GITLAB_SECURITY_PRODUCTS_REGISTRY: registry.gitlab.com/gitlab-org/security-products
  GITLAB_SECURITY_PRODUCTS_REGISTRY: registry-dev.transzap.com/devops/images/security-products
docker run --volume "$PWD:/code" \
  "$GITLAB_SECURITY_PRODUCTS_REGISTRY/license-management:$LICENSE_MANAGEMENT_VERSION" analyze /code

That way, at least the Auto DevOps jobs for the security products can pull images from a local registry intead of over the internet from registry.gitlab.com every single time.

Step 2: Add support for ash, dash, and other shells in GitLab CI

I'm not totally sure about the state of this problem in GitLab CI. But, for images like sitespeed/sitespeed.io to work, GitLab CI needs to be able to work with shells that don't support bashisms.

There was an attempt at this in gitlab-runner!309 (closed).

Step 3: Restructure Auto DevOps jobs to not use Docker-in-Docker

Assuming issues in GitLab CI are resolved, rewrite the Auto DevOps jobs to run job images directly, instead of using Docker-in-Docker.

What does success look like, and how can we measure that?

Auto DevOps jobs should complete quickly. In my modified pipeline, just implementing step 1 has cut the time needed to run the security jobs down to about a minute each. Otherwise, jobs like license_management job spend 4 or 5 minutes just pulling the image. The code_quality job is particularly bad.

I'll try to gather some actual numbers later.

Links / references

Edited by 🤖 GitLab Bot 🤖