Using a Windows volume with existing data in Kubernetes is unbearably slow
Summary
The init-permissions
container takes a long time (times out with default settings) when trying to use a persistent volume for the build area for a windows job in the Kubernetes executor.
Steps to reproduce
- Configure a Kubernetes runner to execute jobs on Windows nodes, with a persistent volume as the build area.
- Run a job from a large project (> 100 000 files on disk after a build) on the runner. This is slow, because it takes time to clone the project.
- Run another job later. This should be fast, because the clone is already present. But the
init-permissions
container times out.
concurrent = 1
[[runners]]
executor = "kubernetes"
builds_dir = "/mnt/builds"
[runners.kubernetes]
[[runners.kubernetes.volumes.pvc]]
# The cluster has a Persistent Volume Claim named build-pvc, suitable for a build area.
name = "build-pvc"
mount_path = "/mnt/builds"
Actual behavior
The init-permissions
container takes a long time.
Expected behavior
The init-permissions
job should take a reasonable time.
Relevant logs and/or screenshots
job log
Waiting for pod glr-dev/runner-3l3hmew1-project-32-concurrent-0-hdd0yja1 to be running, status is Pending
ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Then the same message 192 times, followed by a timeout.
Environment description
Self-hosted GitLab with a gitlab-runner installed from Helm to an AKS cluster. Runner configured to run jobs on a windows node pool.
Used GitLab Runner version
16.3
Possible fixes
The problem is caused by the flag /t
to icacls.exe
in kubernetes.go.
The flag makes the command recursively set the permission on all the existing folders and files.
Please note that the flag is not necessary for making the permissions inheritable. The (OI)
and (CI)
in the /grant
do that.
The /t
activates recursion for currently existing files. I believe this to be unnecessary and if it is not,
it should be possible to explicitly disable, so that persistent volumes can have acceptable performance for
large projects.