Push CI Job Cache only if needed
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Problem to solve
Reduce the time needed for CI jobs by only pushing caches if needed.
In bigger projects pushing the cache at the end of a ci job can take longer then the jobs scripts itself. This is particular annoying if the cache contents have not changed.
User experience goal
Developers want fast feedback from their ci pipelines and be able to merge successful pipelines as soon as possible.
Proposal
Add a new cache policy where the cache is only pushed if no cache with the given cache key exists.
Doing so would save the time needed for collecting/packing the cache contents and also the time needed for uploading it to the distributed cache storage.
Further details
The new policy could be named pull-immutable
as the cache should be pulled but is not expected to change and therefore only pushed if needed.
The evaluation if the cache needs to be pushed could be done two ways:
- At the end of the job directly before pushing by querying the api of the distributed cache storage.
- At the beginning of the job by reusing the information if a cache was pulled in the first place.
Example usage:
build:
cache:
key:
files:
- .nvmrc
- package-lock.json
paths:
- node_modules/
policy: pull-immutable
before_script:
- >
[[ -d node_modules ]] || npm ci
script:
- npm run build
In this example job for a nodejs project the package-lock.json describes the exact tree of dependencies inside the cached node_modules
directory while .nvmrc refers to the used node version.
By using this two files as the cache key the content of the cache can be considered immutable, as the same files would always result in exactly the same content of the cache.
Therefore pushing the cache in cases where a cache with the same key already exists is redundant and should be avoided.
Additionally this example also uses a conditional expression ([[ -d node_modules ]] || npm ci
) in order to only assemble/build the cache if it doesn't already exists.
Alternative considerations:
1. Detect unmodified caches without the need of a new cache policy.
From a user perspective this would be even better as he would get the advantage of faster jobs without any extra work. The biggest disadvantage here would be the technical implementation as checking if the contents of a directory have change isn't a simple problem. Caches can get large and contain more than half a million files. In the end the check would need to be as fast as possible and wouldn't make much sense if it takes longer then pushing the cache in the first place.
2. Abillity to dynamicly disable pusing the cache by exporting a environment variable as party of the jobs scripts.
This would allow more possible scenarios by moving the decision to the jobs scripts. Here the user could then write code that dynamically evaluates if the cache should be pushed. The biggest disadvantage with this would be that the burden of implementation is shifted the user and result in much higher barrier of entry.
Documentation
Links / references
- May also solve #224650 Feature Request: short-circuit CI pipelines based on if cache exists