Pilots failover
What does this MR do?
Introduces a single USE_PILOT_RUNNERS toggle (default: true) that makes all runner tags variable-driven. When disabled, jobs fall back to shared/instance runner tags, restoring GitLab Duo.
Files Changed
1. .gitlab/ci/variables.gitlab-ci.yml — Toggle and pilot tag defaults:
USE_PILOT_RUNNERS: "true"
RUNNER_TAG_DEFAULT: "functions-pilot-linux-amd64"
RUNNER_TAG_DOCKER: "functions-pilot-linux-amd64"
RUNNER_TAG_WINDOWS: "functions-pilot-windows-amd64"
RUNNER_TAG_E2E_LINUX_AMD64: "functions-pilot-linux-amd64"
RUNNER_TAG_E2E_LINUX_ARM64: "functions-pilot-linux-arm64"
2. .gitlab-ci.yml — Default tag uses variable, workflow rule overrides tags when toggle is off:
default:
tags:
- $RUNNER_TAG_DEFAULT
workflow:
rules:
- if: '$USE_PILOT_RUNNERS == "false"'
variables:
RUNNER_TAG_DEFAULT: "gitlab-org"
RUNNER_TAG_DOCKER: "gitlab-org-docker"
RUNNER_TAG_WINDOWS: "saas-windows-medium-amd64"
RUNNER_TAG_E2E_LINUX_AMD64: "saas-linux-medium-amd64"
RUNNER_TAG_E2E_LINUX_ARM64: "saas-linux-medium-arm64"
3. .gitlab/ci/build-docker.gitlab-ci.yml — Fixed hardcoded tag:
# before
docker image legacy:
tags: [gitlab-org-docker]
# after
docker image legacy:
tags: [$RUNNER_TAG_DOCKER]
4. .gitlab/ci/deploy.gitlab-ci.yml — Fixed hardcoded tag:
# before
legacy image:
tags: [gitlab-org-docker]
# after
legacy image:
tags: [$RUNNER_TAG_DOCKER]
5. .gitlab/ci/test.gitlab-ci.yml — Fixed hardcoded tag:
# before
go-test-windows:
tags: [saas-windows-medium-amd64]
# after
go-test-windows:
tags: [$RUNNER_TAG_WINDOWS]
6. .gitlab/ci/e2e.gitlab-ci.yml — Split into separate jobs with per-arch tag variables:
# before (matrix approach — broken: GitLab CI doesn't recursively expand variables in matrix values)
e2e:image-version:
tags: [$RUNNER_TAG_E2E]
parallel:
matrix:
- RUNNER_TAG_E2E: $RUNNER_TAG_E2E_LINUX_AMD64 # ← treated as literal string
# after (separate jobs — each references tag variable directly)
.e2e:image-version:
# shared template
e2e:image-version:linux-amd64:
extends: .e2e:image-version
tags: [$RUNNER_TAG_E2E_LINUX_AMD64]
e2e:image-version:linux-arm64:
extends: .e2e:image-version
tags: [$RUNNER_TAG_E2E_LINUX_ARM64]
Tag Mapping
| Variable | Pilot (default) | Fallback (USE_PILOT_RUNNERS=false) |
|---|---|---|
RUNNER_TAG_DEFAULT |
functions-pilot-linux-amd64 |
gitlab-org |
RUNNER_TAG_DOCKER |
functions-pilot-linux-amd64 |
gitlab-org-docker |
RUNNER_TAG_WINDOWS |
functions-pilot-windows-amd64 |
saas-windows-medium-amd64 |
RUNNER_TAG_E2E_LINUX_AMD64 |
functions-pilot-linux-amd64 |
saas-linux-medium-amd64 |
RUNNER_TAG_E2E_LINUX_ARM64 |
functions-pilot-linux-arm64 |
saas-linux-medium-arm64 |
Why was this MR needed?
Pilot runners were tagged with shared runner tags (e.g. gitlab-org) to match existing jobs. Shared/instance runners were disabled to ensure deterministic routing, which broke GitLab Duo (needs instance runners with gitlab--duo tag). This MR allows re-enabling shared/instance runners since pilot runners now have unique functions-pilot-* tags.
What's the best way to test this MR?
- Default behavior: verify pipeline jobs are picked up by
functions-pilot-*runners - Set
USE_PILOT_RUNNERS=falseas a CI/CD variable and verify jobs fall back to shared/instance runners