Backend: short-circuit CI pipelines based on if cache exists
Problem to solve
"As a GitLab CI/CD pipeline developer, I want to skip jobs (that generate a cache) if this cache already exists (and is up-to-date), so I can get faster pipeline feedback."
Reduce pipeline build time by skipping build jobs that (a) fetch dependencies based on a rarely changing dependency definition file and (b) cache the fetched dependencies for subsequent builds. E.g. for Node.js/npm dependencies I have found to reduce build time by up to ~25% or almost 2min when the build duration is ~5+min for a small project; in comparison to https://docs.gitlab.com/ee/ci/caching/index.html#caching-nodejs-dependencies:
stages:
- setup
- test
cache: &global_cache
key:
files:
- package-lock.json
prefix: "${CI_PROJECT_PATH_SLUG}-${CI_COMMIT_REF_SLUG}-"
paths:
- node_modules/
- .npm/
prepare:
stage: setup
cache:
<<: *global_cache # inherit all global cache settings
rules:
- changes:
- package-lock.json
# TODO: What I want is "OR if there is no cache (with the given key) available", not sure how the syntax should look like
- !cache_exists
script:
- npm ci --cache .npm --prefer-offline --no-audit --no-optional
build_test:
stage: test
cache:
<<: *global_cache # inherit all global cache settings
policy: pull
script:
- npm run lint
- npm run test:ci
- npm run build:ci
- npm run e2e:ci
That is, the "prepare" job (to fetch the Node.js/NPM dependencies and cache them) must be executed when:
- either the definition of dependencies has changed (due to changes in "package-lock.json")
- or if there is no cache with the dependencies yet (or anymore, e.g. due to manually clearing/deleting the caches)
As mentioned in http://disq.us/p/22vac4a the second condition cannot be specified now:
Just a note. The only:changes condition will only work if this is the first push to a new branch, or if you have an existing cache for the branch. Just because your package-lock.json hasn't changed, doesn't mean node_modules has been cached previously.
Intended users
Hm, adding this flag will be the GitLab CI/CD pipeline developer, but all of the following will benefit in some way from faster pipeline speed:
- Delaney (Development Team Lead)
- Sasha (Software Developer)
- Devon (DevOps Engineer)
- Sidney (Systems Administrator)
- Rachel (Release Manager)
- Simone (Software Engineer in Test)
User experience goal
(See above)
Proposal
In addition to if
, changes
and exists
also add something like cache_exists
; but negation (cf. #34859) is here even more important! From aforementioned complete example:
cache:
<<: *global_cache # inherit all global cache settings
rules:
- changes:
- package-lock.json
# TODO: What I want is "OR if there is no cache (with the given key) available", not sure how the syntax should look like
- !cache_exists
Not sure if the cache key should be (optionally) re-specified, if it may differ from the one from the job (although I am not sure if there is any use case for that, or if that makes sense at all?)
Alternative syntax for negation:
rules:
...
- cache_exists: false
... because that might be more consistent with current syntax (although not fully, because exists
actually takes a set of filenames) and allow the following:
rules:
...
- cache_exists
I think extending the existing exists
to apply to the cache does not really work; and how to combine exists
of files AND caches then?
rules:
...
- exists: CACHE
Links / references
https://docs.gitlab.com/ee/ci/caching/index.html#caching-nodejs-dependencies