Accelerate CI cache packing/unpacking using a faster archive format such as tar+Zstandard
Description
The files we cache right now are hundreds of MB in size which results in a significant amount of time creating the zip archive with a single core pegged at 100% and likewise for the extraction of the zip archive when it gets used. It greatly diminishes the whole value of having a caching feature in the first place which is to speed up the build.
Proposal
Use a faster, more sophisticated archive/compression such as tar+Zstandard. Zstandard offers the benefit of:
- Well parrallelized compression algorithm
- Many configuration knobs to adjust the tradeoff between compression ratio and compression speed that could be exposed via Gitlab config parameters. This would allow users to optimized the settings based on the CPU capabilities of their runners and the link speed between their Gitlab server and their runners.
Since this would require the Zstandard support in both Gitlab Runner and Gitlab Server it would make sense that this is an opt-in only feature via a gitlab.rb
config parameter. Then, if that parameter is enabled, Gitlab could list a "Caching not supported by this runner" warning for each connected runner that does not support tar+Zstandard.
Links / references
- zstd homepage: http://facebook.github.io/zstd/
- Go wrapper for zstd: https://github.com/DataDog/zstd
- Go package for tar: https://golang.org/pkg/archive/tar/