Skip to content

Fastzip fails to archive cache due to temp path collision

Summary

It appears as if the current implementation of fastzip at least on Windows contains a temp path collision in the fastzip implementation.

Steps to reproduce

  1. Set up a gitlab runner on a Windows machine.
  2. Register it to one or multiple GitLab instances with/as shell executor
  3. Set up a project that has multiple jobs with independent cache keys.
  4. Trigger a pipeline that will cause those jobs to run on that runner concurrently and will cause them to archive their caches at the same time.
  5. Observe the reported log-messages.

I'm sorry, I cannot provide better steps to reproduce or a demo-project at this time, if you really need that, please let me know and I'll take the time to derive a reproduction setup that I am allowed to publish.

Actual behavior

One of the jobs logged this while trying to zip the cache:

Creating cache DevBuild-Benchmark-Win64-1...
Runtime platform                                    arch=amd64 os=windows pid=3172 revision=943fc252 version=13.7.0
<redacted>: found 56277 matching files and directories 
FATAL: remove C:\WINDOWS\TEMP\fastzip_00: The process cannot access the file because it is being used by another process.
remove C:\WINDOWS\TEMP\fastzip_01: The process cannot access the file because it is being used by another process.
remove C:\WINDOWS\TEMP\fastzip_02: The process cannot access the file because it is being used by another process.
remove C:\WINDOWS\TEMP\fastzip_03: The process cannot access the file because it is being used by another process.
remove C:\WINDOWS\TEMP\fastzip_04: The process cannot access the file because it is being used by another process.
remove C:\WINDOWS\TEMP\fastzip_05: The process cannot access the file because it is being used by another process.
remove C:\WINDOWS\TEMP\fastzip_06: The process cannot access the file because it is being used by another process.
remove C:\WINDOWS\TEMP\fastzip_07: The process cannot access the file because it is being used by another process.
 
Failed to create cache

The paths do not appear to contain any means for avoiding collisions between concurrently running jobs or even between different applications using a similar implementation.

Expected behavior

Creating cache DevBuild-Benchmark-Win64-1...
Runtime platform                                    arch=amd64 os=windows pid=388 revision=943fc252 version=13.7.0
<redacted>: found 56277 matching files and directories 
No URL provided, cache will be not uploaded to shared cache server. Cache will be stored only locally. 
Created cache

Relevant logs and/or screenshots

Environment description

The issue occurred for us with a GitLab runner with the following properties:

  • shell executor (likely relevant)
  • with concurrent=2, limit=2 (likely relevant)
  • multiple jobs running and pending (likely relevant)
  • FF_USE_FASTZIP is set to 1 for the involved projects. (likely relevant)
  • running on a Windows machine (possibly relevant)
  • registered to multiple different GitLab instances (probably not relevant)

This is a self-hosted runner registered to multiple self-hosted GitLab instances operated by different parties.

config.toml contents
concurrent = 2
check_interval = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = "[redacted]"
  url = "[redacted]"
  token = "[redacted]"
  limit = 1
  executor = "shell"
  builds_dir = "C:/gitlab-1/b"
  cache_dir = "C:/gitlab-1/c"
  shell = "cmd"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]

[[runners]]
  name = "[redacted]"
  url = "[redacted]"
  token = "[redacted]"
  limit = 2
  executor = "shell"
  builds_dir = "C:/gitlab-2/b"
  cache_dir = "C:/gitlab-2/c"
  shell = "cmd"
  output_limit = 40960
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]

The second token was used to run the jobs in question.

Used GitLab Runner version

/gitlab-runner-windows-amd64.exe --version
Version:      13.7.0
Git revision: 943fc252
Git branch:   13-7-stable
GO version:   go1.13.8
Built:        2020-12-21T13:47:18+0000
OS/Arch:      windows/amd64

Possible fixes

A very brief search lead to this line: https://gitlab.com/gitlab-org/gitlab-runner/-/blob/master/vendor/github.com/saracen/fastzip/internal/filepool/filepool.go#L145