CNG: Increase cache involvement in pipelines by revisiting the structure of the Dockerfiles and SHA calculation method

Currently CNG pipelines barely hit the cache. This is because we do not cache the intermediate images of build stages as well as the way that we calculate the cache keys, i.e. image SHA. With some adjustment we can improve the cache involvement in the pipelines and speed them up.

Problem

We can not guess the image SHA from source. Instead we calculate it based on source code changes. Then we try to pull the image with the SHA as a tag with the intention of using it to populate the cache, and finally retag the image at the end of the pipeline to publish them.

We can build on this, simplify it further, and improve the cache usage.

Assumptions

We need to make a few assumptions here:

We are using BuildKit as the build backend (the frontend is not as important but we use buildx).
There is a Docker registry that we use for caching. Ideally this is not the same as the registry that we are using to publish the images. To contrast, we refer to them as Cache Registry and Publish Registry. Ideally we collocate the Cache Registry in the cluster that runs the build fleet.
We mostly follow a predefined development workflow in which we fork a feature branch from the master branch, make the changes in one or more iterations to the branch, and finally merge these changes into the master branch.

Revisiting the build process

Populating the build cache

The first issue that we need to address is the build cache population. We need to be able to map the changes of the source to specific cache entries. Cache entries, which are Docker image layers, can be addressed with their SHA. So to populate cache we need to map the source code changes to a Docker manifest SHA which is the pointer to a specific image layer.

Calculate a tag from source

The tag must be unique to each change-set and idempotent across different iterations.

Currently we calculate this tag based on multiple factors, including change revision number, dependencies, etc. It can be argued that we don’t need to calculate the tag per image per change, as we currently do. Rather than that a tag per change would suffice. It might not be as effective as the current method but simpler to maintain and the inefficiency may occur when we try to populate the cache with the images that are not needed.

For example, the branch name is a good candidate. Each feature is developed in its own branch, and two branches with the same name can not co-exist. At the same time further changes are applied to the same branch which leads to incremental cache population.

Map the tag to a manifest SHA

Check the Cache Registry. If the tag does not exist, use the master branch alias instead which always points to the latest manifests that are built from the master branch.

Use the calculated tag (or the master alias) to find the manifest SHA. This can be done with the docker manifest inspect command. It retrieves the latest manifest of the specified tag of the image from the registry. In this case this is the Cache Registry.

This is the SHA that is used for importing cache from. However, we still want to store the cache using the calculated tag for future iterations on the same branch. So we use it for exporting cache.

Configure build cache for BuildKit

With BuildKit we do not need to explicitly pull the images to be used as cache. We can explicitly specify where the build cache is coming from and where it needs to be stored. The following options do this for buildx:

--cache-from=type=registry,ref=CAHCE_REGISTRY/IMAGE@CacheFromSHA
--cache-to=type=registry,ref=CAHCE_REGISTRY/IMAGE:Tag

Since we are using multi-stage builds we should also cache the intermediate layers. This can be done with setting the cache mode to max which “export all the layers of all intermediate steps”:

--cache-to=type=registry,ref=CAHCE_REGISTRY/IMAGE:Tag,mode=max

Note that this is not the same as the publish tag. For publishing image we can use a different tag:

--tag PUBLISH_REGISTRY/IMAGE:VERSION_OR_BRANCH

This tag is pushed separately. In the current pipeline we use skopeo for syncing different repositories and the same method can be used for pushing images to Publish Registry.