Parallel CI jobs from file globs
Proposal
The parallel
keyword in CI job definitions already accepts the matrix
keyword, which simplifies the creation of near-identical jobs where only a few parameters differ.
The matrix
keyword currently only allows a fixed list of values for the variables. When these variables refer to files or directories, this means that adding a new file that needs to be included in the CI jobs also requires modifying the pipeline definition.
My proposal is to allow the deifiniton of parallel:matrix
-like variables with glob or regex patterns, that would match files in the repository. Then, one matrix variant would be created for each file found matching the pattern.
Compared to a shell for
loop (which is the current way to do this), this would have the following advantages:
- Each file is processed in an independant job, isolated from the side effects of other executions
- The files can be processed by several runners concurrently, while a shell
for
can only be processed by a single runner - Each command can be written in its own item of the
script
list, allowing to record job failure as soon as one of them fails, instead of waiting for the loop to end - Each instance would have its own status, so successful executions won't need to be retried if only one file caused a failure
The examples below assume the use of the keyword parallel:files
, contaning a map where the key is the variable name and the value is a glob pattern. This does not need to be the final version.
Examples and use cases
Building and pushing several variants of a Docker image (note the use of the variable $DOCKERFILE
):
build-images:
image: docker:stable
services:
- docker:dind
stage: build
script:
- docker login -u gitlab-ci-token -p $CI_JOB_TOKEN $CI_REGISTRY
- docker build -f $DOCKERFILE -t $CI_REGISTRY/$CI_PROJECT_PATH:${DOCKERFILE#Dockerfile.}
- docker push $CI_REGISTRY/$CI_PROJECT_PATH:${DOCKERFILE#Dockerfile.}
parallel:
files:
DOCKERFILE: "Dockerfile.*"
Building several .NET projects, combined with additional parallel:matrix
variables for other build parameters (note the use of the variable $PROJECT
):
build-images:
image: example.com/ci-images/dotnet-cli
stage: build
script:
- dotnet publish -o $CONFIGURATION/$RUNTIME/$PROJECT -c $CONFIGURATION -r $RUNTIME $PROJECT
parallel:
matrix:
- CONFIGURATION: ["Debug", "Release"]
RUNTIME: ["win-x86", "win-x64", "linux-x64", "osx-x64", "osx-arm64"]
files:
PROJECT: "**/*.csproj"
Limitations
- The same limits as with other uses of
parallel
(especially regarding the maximum number of jobs created this way) will have to be enforced. This is especially true in the second example, in which only 5csproj
files anywhere in the repository would be enough to reach the 50 jobs limit. - The matrix-based parallelization and file-based parallelization may need to be mutally exclusive
- This feature may need to be restricted to files already in the repository, disallowing the use of files from cache or artifacts (artifacts-based parallelization might be a desired feature, but not in the scope of this issue)