Skip to content

CI pipeline with parallel jobs sharing the same cache can result in an incomplete cache

Summary

A pipeline which runs jobs in parallel which share the same cache can produce non deterministic results in the cache resp. some files may be missing in subsequent jobs.

It looks there is a race condition between the concurrent jobs

  • job 1 starts prepares the cache (local file deletion and extraction from pulled cache)
  • job 2 is finished and builds the cache archive at the same time job 1 deletes cache, so an incomplete archive is built and cached files get lost

To prevent this cache extraction and creation need to be synchronized. They must not run at the same time (at least on the same runner).

Steps to reproduce

Since this is a race condition it's not easy to reproduce this behavior reliably. But I have a .gitlab-ci.yml at hand and screenshots, which show an erroneous and expected result.

This example pipeline just initializes the cache by deleting all files in the first stage and add one file to it. Then two jobs start in parallel and add different subdirectories. The last stages just shows the content of the cache folder.

stages:
  - cache-init
  - cache-add
  - cache-show

cache-init:
  stage: cache-init
  cache:
    key: stuff
    paths:
      - cache
  script:
    - rm -rf cache
    - mkdir -p cache
    - date > cache/index.txt

cache-1:
  stage: cache-add
  cache:
    key: stuff
    paths:
      - cache
  script:
    - mkdir -p cache/1
    - date > cache/1/stuff.txt

cache-2:
  stage: cache-add
  cache:
    key: stuff
    paths:
      - cache
  script:
    - mkdir -p cache/2
    - date > cache/2/stuff.txt

cache-show:
  stage: cache-show
  cache:
    key: stuff
    policy: pull
    paths:
      - cache
  script:
    - ls -lR cache

What is the current bug behavior?

This screenshot shows the output of job cache-show. The folder 2 of the job cache-2 is missing.

cache-incomplete

What is the expected correct behavior?

This screenshot shows the output of job cache-show with a complete cache directory.

cache-complete

Results of GitLab environment info

I don't have shell access to our GitLab server, so I can't run the environment commands, sorry.

GitLab CE: 10.3.2.

Runner: gitlab-ci-multi-runner, 9.5.1, shell runner

Edited by 🤖 GitLab Bot 🤖