Backend: short-circuit CI pipelines based on if cache exists

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Problem to solve

"As a GitLab CI/CD pipeline developer, I want to skip jobs (that generate a cache) if this cache already exists (and is up-to-date), so I can get faster pipeline feedback."

Reduce pipeline build time by skipping build jobs that (a) fetch dependencies based on a rarely changing dependency definition file and (b) cache the fetched dependencies for subsequent builds. E.g. for Node.js/npm dependencies I have found to reduce build time by up to ~25% or almost 2min when the build duration is ~5+min for a small project; in comparison to https://docs.gitlab.com/ee/ci/caching/index.html#caching-nodejs-dependencies:

stages:
  - setup
  - test

cache: &global_cache
  key:
    files:
      - package-lock.json
    prefix: "${CI_PROJECT_PATH_SLUG}-${CI_COMMIT_REF_SLUG}-"  
  paths:
    - node_modules/
    - .npm/


prepare:
  stage: setup
  cache:
    <<: *global_cache # inherit all global cache settings
  rules:
    - changes:
      - package-lock.json
    # TODO: What I want is "OR if there is no cache (with the given key) available", not sure how the syntax should look like
    - !cache_exists
  script:
    - npm ci --cache .npm --prefer-offline --no-audit --no-optional


build_test:
  stage: test
  cache:
    <<: *global_cache # inherit all global cache settings
    policy: pull
  script:
    - npm run lint
    - npm run test:ci
    - npm run build:ci
    - npm run e2e:ci

That is, the "prepare" job (to fetch the Node.js/NPM dependencies and cache them) must be executed when:

  • either the definition of dependencies has changed (due to changes in "package-lock.json")
  • or if there is no cache with the dependencies yet (or anymore, e.g. due to manually clearing/deleting the caches)

As mentioned in http://disq.us/p/22vac4a the second condition cannot be specified now:

Just a note. The only:changes condition will only work if this is the first push to a new branch, or if you have an existing cache for the branch. Just because your package-lock.json hasn't changed, doesn't mean node_modules has been cached previously.

Intended users

Hm, adding this flag will be the GitLab CI/CD pipeline developer, but all of the following will benefit in some way from faster pipeline speed:

User experience goal

(See above)

Proposal

In addition to if, changes and exists also add something like cache_exists; but negation (cf. #34859) is here even more important! From aforementioned complete example:

  cache:
    <<: *global_cache # inherit all global cache settings
  rules:
    - changes:
      - package-lock.json
    # TODO: What I want is "OR if there is no cache (with the given key) available", not sure how the syntax should look like
    - !cache_exists

Not sure if the cache key should be (optionally) re-specified, if it may differ from the one from the job (although I am not sure if there is any use case for that, or if that makes sense at all?)

Alternative syntax for negation:

  rules:
    ...
    - cache_exists: false

... because that might be more consistent with current syntax (although not fully, because exists actually takes a set of filenames) and allow the following:

  rules:
    ...
    - cache_exists

I think extending the existing exists to apply to the cache does not really work; and how to combine exists of files AND caches then?

  rules:
    ...
    - exists: CACHE

Links / references

#16905

gitlab-foss#19232 (closed)

&2783

https://docs.gitlab.com/ee/ci/caching/index.html#caching-nodejs-dependencies

https://www.addthis.com/blog/2019/05/06/how-to-speed-up-your-gitlab-ci-pipelines-for-node-apps-by-40/#.XvWt0CgzaUk

Edited by 🤖 GitLab Bot 🤖