Option to Disable Cache Diffing (Force Full Cache Upload) for Jobs with policy: pull-push
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
Proposal
GitLab Runner’s cache update mechanism performs a full filesystem walk and comparison when reusing an existing cache key under policy: pull-push. For workloads with a very large number of small files (e.g., npm, pnpm, Maven .m2/repository, Unity assets), this diffing step incurs a disproportionate runtime cost while providing negligible benefit when files rarely change.
This affects large self-managed and SaaS customers with monorepos or dependency-heavy stacks, where cache diffing dominates job time.
Current Behaviour
- First use of a new cache key: full cache archive upload (fast enough for most cases).
- Second and subsequent uses: Runner performs a
stat/mtime/sizecomparison across all cached files to determine what to upload. - With 100k+ files, this diff calculation can take minutes even if no files changed.
- No configuration exists to skip this diff step at job level.
Impact
- In a Maven + npm build with ~150 000 files in cache, diff calculation adds 2–3 minutes to jobs that otherwise complete in ~3 minutes, doubling the time needed for the job to complete.
- Similar reports exist for npm (large node_modules) and Unity (large asset folders).
- This means ~50% of job time is spent diffing rather than running build/test code.
- For jobs that rarely change dependencies, the diffing is wasted effort.
Workarounds Tried
- Increasing CPU/memory for caching container: only small improvements.
- Weekly cache clearing: prevents bloat but doesn’t fix diffing cost.
- Using unique per-commit cache keys: removes diff but makes caching ineffective.
- Switching to artifacts: adds complexity, does not address relative cache time.
Proposed Change
Add a job-level option in .gitlab-ci.yml to bypass diffing entirely and always upload the full cache at job end, even if the cache key exists. Example syntax:
cache:
key: maven-backend-with-frontend-build
diff: false
policy: pull-push
paths:
- .m2/repository/
- "**/node_modules"
- ".pnpm-store"
Expected Behaviour
- With
diff: false, Runner would skip the stat/walk of existing cache files. - When pushing, it would always archive the current cache paths and replace the existing archive in the cache backend.
- This eliminates the CPU/I/O cost of diffing while keeping cache sharing across jobs and commits.
Benefits
- Significant reduction in pipeline runtimes for large dependency caches.
- No loss of cache usefulness (still shared across jobs).
- No extra CI/CD complexity for users.
- Helps not only npm/Maven cases, but also Unity and other asset-heavy builds.
Related Issues
- gitlab-org/gitlab-runner#1461 – npm cache slowness due to many small files.