Support for dynamic PVC volumes per pipeline for the Kubernetes executor
Status update - 2024-11-11
- We don't have a clear path forward so we are not including this feature in the FY26 roadmap.
- If we were to pick up work on this feature, we need to design the solution and make concrete decisions as to the implementation.
Overview
The handling of large files created or downloaded during pipeline runs can be a challenge, especially if one doesn't want to use artifacts to transfer them from one job to another.
Defining artifacts and up-/downloading them to GitLab is a great way if one uses a fleet of diverse runners to execute a pipeline. However in our case the majority of jobs are executed via a fleet of Kubernetes-based shared runners.
This fact opens the possibility for massive improvements towards the handling of large file assets that need to traverse multiple jobs during the execution of a single pipeline.
Currently the Kubernetes executor only supports the static definition of a single PVC-based Kubernetes volume that can be mounted for this purpose to the job-executing container: https://docs.gitlab.com/runner/executors/kubernetes.html#pvc-volumes
This functionality should be extendable to a model where the runner config defines similar options that the current PVC implementation but additionally contains a Kubernetes StorageClass.
This would allow the Kubernetes executor to provision dynamic PVC-based storage volumes with a specified size that use a auto-generated name that e.g. is based on the CI_PIPELINE_ID
.
These dynamically created storage volumes that are pipeline scoped can than be used by jobs to pass larger file assets between different jobs and stages of a single pipeline, eliminating the need to upload and download them from the central GitLab instance.
Proposal
- Introduce a new set of dynamic PVC volumes to the Kubernetes executor that are pipeline scoped
- The runner configuration probably needs some context about the to be created volumes:
- StorageClass to use for provisioning
- Size
- mount_path - (https://docs.gitlab.com/runner/executors/kubernetes.html#pvc-volumes)
The first job executed in a certain pipeline would create a new dynamic and pipeline scoped PVC. This PVC probably needs to be RWX
in order to properly support multiple parallel running jobs.
Each job from one distinct pipeline would have access to the contents of this shared storage volume.
After the pipeline finished (successful or unsuccessful) the PVC would be deleted and storage therefore freed-up again after the pipeline execution.
This feature would allow the Kubernetes executor to support the transfer of large file assets between jobs and stages via local storage while still keeping a strong isolation between different pipelines, users and projects.
Links to related issues and merge requests / references
- GitLab Internal Support Ticket: https://support.gitlab.com/hc/en-us/requests/291116
- Similar idea but PVCs are job scoped here and do not support the exchange of data between jobs and/or stages: #27835
- Also mentions dynamic PVC volumes but doesn't specify there scope in detail: #21308