Add support for named GitLab Runner PodSpec(s) in a .gitlab.ci yml pipeline file

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Overview

Enabling support for named GitLab Runner PodSpec(s) to configure the Runner worker will unlock value for users of GitLab for scientific workflows. For example, data scientists, machine learning researchers, or engineers will require GPU-enabled computing on public cloud Kubernetes infrastructure. To use GPU-enabled compute means creating different podspecs for each class of GPU-enabled compute offered. Today, the podspec can only be set in the runner config.toml file.

So first - this means the administrator of the self-managed Runner has to configure the podspec for each compute type. Secondly, every job on the specified Runner will request GPU-enabled resources regardless of whether the job requires a GPU. As a workround, customers can configure multiple runners, each with its own pod spec. That approach is not efficient or flexible.

Value proposition (customer verbatims)

  • "Here is one specific example: say we added a new type of node to a Kubernetes cluster, and we wanted to use a node selector to run a pod on that specific node. It would be easier for the CI pipeline developers to make a code change (.gitlab-ci.yml) than to have to contact a system administrator to modify the Runner's configuration (config.toml) to add a new named PodSpec.

  • The workaround of multiple names pod specs is not ideal as "there are common situations that have a number of permutations. Take, for example, the GPU specification for GKE Autopilot workloads as shown in the Google documentation. If we wanted to cover all GPU type/quantity options, that would be eight unique pod specs. Not an intractable number, but slightly cumbersome."

Proposal

Allow users to configure a podspec in the .gitlab-ci.yml pipeline.

Example

config.toml

...

[[runners]]
  [runners.kubernetes]
    [[runners.kubernetes.pod_spec]]
      name = "spec-a100-3"
      patch = '''
        nodeSelector:
          cloud.google.com/gke-accelerator: "nvidia-tesla-a100"
        containers:
        - name: "build"
          resources:
            limits:
              nvidia.com/gpu: 2
      '''
      patch_type = "strategic"
...

gitlab-ci

pod_specs:
  - name: spec-a100-1
    patch_type: "strategic"
    patch: |
      hostname: "my_host"
  - name: spec-a100-2
    patch_type: "json"
    patch: |
      [{"op": "replace", "path": "/terminationGracePeriodSeconds", "value": 60}]

job01:
  pod_specs:
    - spec-a100-2
  script:
    - ./cmd01
    - ./cmd02

job02:
  pod_specs:
    - spec-a100-1
    - spec-a100-3
  script:
    - ./cmd01
    - ./cmd02

job03:
  script:
    - ./cmd01
    - ./cmd02
  services:
    - name: my-postgres:11.7
      alias: db-postgres
      entrypoint: ["/usr/local/bin/db-postgres"]
      command: ["start"]
    - pod_specs:
      - spec-a100-1
      - spec-a100-3
Edited by 🤖 GitLab Bot 🤖