Option to Disable Cache Diffing (Force Full Cache Upload) for Jobs with policy: pull-push

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Proposal

GitLab Runner’s cache update mechanism performs a full filesystem walk and comparison when reusing an existing cache key under policy: pull-push. For workloads with a very large number of small files (e.g., npm, pnpm, Maven .m2/repository, Unity assets), this diffing step incurs a disproportionate runtime cost while providing negligible benefit when files rarely change.

This affects large self-managed and SaaS customers with monorepos or dependency-heavy stacks, where cache diffing dominates job time.

Current Behaviour

  • First use of a new cache key: full cache archive upload (fast enough for most cases).
  • Second and subsequent uses: Runner performs a stat/mtime/size comparison across all cached files to determine what to upload.
  • With 100k+ files, this diff calculation can take minutes even if no files changed.
  • No configuration exists to skip this diff step at job level.

Impact

  • In a Maven + npm build with ~150 000 files in cache, diff calculation adds 2–3 minutes to jobs that otherwise complete in ~3 minutes, doubling the time needed for the job to complete.
  • Similar reports exist for npm (large node_modules) and Unity (large asset folders).
  • This means ~50% of job time is spent diffing rather than running build/test code.
  • For jobs that rarely change dependencies, the diffing is wasted effort.

Workarounds Tried

  • Increasing CPU/memory for caching container: only small improvements.
  • Weekly cache clearing: prevents bloat but doesn’t fix diffing cost.
  • Using unique per-commit cache keys: removes diff but makes caching ineffective.
  • Switching to artifacts: adds complexity, does not address relative cache time.

Proposed Change

Add a job-level option in .gitlab-ci.yml to bypass diffing entirely and always upload the full cache at job end, even if the cache key exists. Example syntax:

cache: 
  key: maven-backend-with-frontend-build 
  diff: false 
  policy: pull-push 
  paths: 
  - .m2/repository/ 
  - "**/node_modules" 
  - ".pnpm-store"

Expected Behaviour

  • With diff: false, Runner would skip the stat/walk of existing cache files.
  • When pushing, it would always archive the current cache paths and replace the existing archive in the cache backend.
  • This eliminates the CPU/I/O cost of diffing while keeping cache sharing across jobs and commits.

Benefits

  • Significant reduction in pipeline runtimes for large dependency caches.
  • No loss of cache usefulness (still shared across jobs).
  • No extra CI/CD complexity for users.
  • Helps not only npm/Maven cases, but also Unity and other asset-heavy builds.

Related Issues

Edited by 🤖 GitLab Bot 🤖