Pilots failover

What does this MR do?

Introduces a single USE_PILOT_RUNNERS toggle (default: true) that makes all runner tags variable-driven. When disabled, jobs fall back to shared/instance runner tags, restoring GitLab Duo.

Files Changed

1. .gitlab/ci/variables.gitlab-ci.yml — Toggle and pilot tag defaults:

USE_PILOT_RUNNERS: "true"
RUNNER_TAG_DEFAULT: "functions-pilot-linux-amd64"
RUNNER_TAG_DOCKER: "functions-pilot-linux-amd64"
RUNNER_TAG_WINDOWS: "functions-pilot-windows-amd64"
RUNNER_TAG_E2E_LINUX_AMD64: "functions-pilot-linux-amd64"
RUNNER_TAG_E2E_LINUX_ARM64: "functions-pilot-linux-arm64"

2. .gitlab-ci.yml — Default tag uses variable, workflow rule overrides tags when toggle is off:

default:
  tags:
    - $RUNNER_TAG_DEFAULT

workflow:
  rules:
    - if: '$USE_PILOT_RUNNERS == "false"'
      variables:
        RUNNER_TAG_DEFAULT: "gitlab-org"
        RUNNER_TAG_DOCKER: "gitlab-org-docker"
        RUNNER_TAG_WINDOWS: "saas-windows-medium-amd64"
        RUNNER_TAG_E2E_LINUX_AMD64: "saas-linux-medium-amd64"
        RUNNER_TAG_E2E_LINUX_ARM64: "saas-linux-medium-arm64"

3. .gitlab/ci/build-docker.gitlab-ci.yml — Fixed hardcoded tag:

# before
docker image legacy:
  tags: [gitlab-org-docker]

# after
docker image legacy:
  tags: [$RUNNER_TAG_DOCKER]

4. .gitlab/ci/deploy.gitlab-ci.yml — Fixed hardcoded tag:

# before
legacy image:
  tags: [gitlab-org-docker]

# after
legacy image:
  tags: [$RUNNER_TAG_DOCKER]

5. .gitlab/ci/test.gitlab-ci.yml — Fixed hardcoded tag:

# before
go-test-windows:
  tags: [saas-windows-medium-amd64]

# after
go-test-windows:
  tags: [$RUNNER_TAG_WINDOWS]

6. .gitlab/ci/e2e.gitlab-ci.yml — Split into separate jobs with per-arch tag variables:

# before (matrix approach — broken: GitLab CI doesn't recursively expand variables in matrix values)
e2e:image-version:
  tags: [$RUNNER_TAG_E2E]
  parallel:
    matrix:
      - RUNNER_TAG_E2E: $RUNNER_TAG_E2E_LINUX_AMD64  # ← treated as literal string

# after (separate jobs — each references tag variable directly)
.e2e:image-version:
  # shared template

e2e:image-version:linux-amd64:
  extends: .e2e:image-version
  tags: [$RUNNER_TAG_E2E_LINUX_AMD64]

e2e:image-version:linux-arm64:
  extends: .e2e:image-version
  tags: [$RUNNER_TAG_E2E_LINUX_ARM64]

Tag Mapping

Variable Pilot (default) Fallback (USE_PILOT_RUNNERS=false)
RUNNER_TAG_DEFAULT functions-pilot-linux-amd64 gitlab-org
RUNNER_TAG_DOCKER functions-pilot-linux-amd64 gitlab-org-docker
RUNNER_TAG_WINDOWS functions-pilot-windows-amd64 saas-windows-medium-amd64
RUNNER_TAG_E2E_LINUX_AMD64 functions-pilot-linux-amd64 saas-linux-medium-amd64
RUNNER_TAG_E2E_LINUX_ARM64 functions-pilot-linux-arm64 saas-linux-medium-arm64

Why was this MR needed?

Pilot runners were tagged with shared runner tags (e.g. gitlab-org) to match existing jobs. Shared/instance runners were disabled to ensure deterministic routing, which broke GitLab Duo (needs instance runners with gitlab--duo tag). This MR allows re-enabling shared/instance runners since pilot runners now have unique functions-pilot-* tags.

What's the best way to test this MR?

  1. Default behavior: verify pipeline jobs are picked up by functions-pilot-* runners
  2. Set USE_PILOT_RUNNERS=false as a CI/CD variable and verify jobs fall back to shared/instance runners

What are the relevant issue numbers?

#425 (closed)

Edited by Georgi N. Georgiev | GitLab

Merge request reports

Loading