Re-work approach to caching (!12) · Merge requests · zygoon / NetOTA

Zygmunt Krynicki requested to merge tweak/fix-ci-cache into master Jan 28, 2022

After two days of experiments and over 50 different pipeline runs I've reached the conclusion that GitLab cache is inefficient and should be avoided. This is matching my prior experiences with building Yocto images through GitLab CI.

GitLab cache is perhaps useful for some workloads where the characteristics of the hardware (fast and plentiful) together with the software (slow, with plenty of network dependencies) makes the trade-off sensible. This may match commonly used web programming technologies such as node or python.

The primary flaw of the cache system for Go is that it is in every way worse from what the compiler is doing automatically, with no extra cost. Anything that gitlab-runner does on top already costs more. This is caused by the combination of a pristine build container and the desire to restore and store the cache.

There are two cache-related entities in a typical Go program.

First, based on the contents of the go.mod and go.sum files, one can cache GOMODCACHE, the source archives and basic meta-data of all the modules needed by Go to build and test the project.

Second one can naively keep intermediate build and test results as they are placed in GOCACHE.

On a typical development system with persistent storage, Go maintains both of those locations automatically, storing modules and build intermediate files, making incremental work, including testing, extremely performant.

To translate both to the GitLab cache system one can create two separate cache entries, one keyed by the contents of the go.mod and go.cache and the other based on some arbitrary key, either fixed or related to git meta-data such as the branch name.

Both of those cache keys are already less efficient than what Go is doing automatically, having persistent storage. Whenever go.mod is modified, the mod cache will be cold. Depending on the size of dependencies, it may incur non-trivial amount of network traffic to re-construct. Whenever the build and test cache is missing than potentially substantial number of files, including from the standard library, may have to be re-compiled.

Apart from occasionally hitting a cold cache node, the system must download (or copy) and de-compress the cache before each use and compress and upload (or copy again) the cache after each use.

The cost of those operations is proportional to the size of the cache. It is beneficial to keep the cache large (up to the point where Go decides to remove unused elements) to maximize possibility of reuse.

This effect is dramatically compounded by the use of low-end systems, with slow storage: spinning disks or off-the-shelf consumer SD cards, and slow CPU (low-end Raspberry Pi boards).

The best way to use gitlab runner cache is not to use it. Instead let's use a persistent volume local to each runner. There's a convention to provide a volume for the cache at /cache. If present re-configure go environment to set GOPATH to /cache/go and GOCACHE to /cache/go-build.

Empirical testing shows order-of-magnitude improvement across various grades of hardware, ranging from 3rd gen Intel Core systems, pre-Ryzen AMD APU systems and a range of Raspberry Pi boards. While none of those systems match the current definition of fast, the achieved speedup makes this nearly irrelevant.

Signed-off-by: Zygmunt Krynicki zygmunt.krynicki@huawei.com

Re-work approach to caching

Merge request reports