Skip to content

Mixed Kubernetese windows runner stuck in runner_script_trap

Summary

On a mixed Kubernetes cluster (Linux/Windows) the runner that handles windows pipelines hangs. It will create its two pods (the helper and build) and they come online. The helper goes into {"command_exit_code": 0, "script": "runner_script_trap"} while the build container does nothing. The output in Gitlab appends a few empty lines to its log and then waits for a timeout to eventually kill the pipeline.

Steps to reproduce

  1. Create a mixed kubernetes cluster. for local testing I have:
NAME              STATUS   ROLES                  AGE     VERSION
dev-builder       Ready    control-plane,master   33d     v1.21.0
dev-builder-win   Ready    <none>                 4d19h   v1.21.1

dev-builder doubles as main server, it runs Ubuntu server 20.04. Kubernetes is installed on top of cri-o dev-builder-win is a worker node running on windows server 2019 (10.0.17763.1999) with Kubernetes on containrD I connected the two in a cluster following the Kubernetes manual

  1. Create a namespace in kubernetes called gitlab

  2. Deploy a Gitlab runner to the cluster using helm following the instructions from the Gitlab manual while using the following values.yaml:

name: "Runner-Windows"
gitlabUrl: https://git.mydomain.com
runnerRegistrationToken: thetoken
certsSecretName: certname
log-level: info
tags: "windows"
envVars:
- name: CI_SERVER_TLS_CA_FILE
  value: /home/gitlab-runner/.gitlab-runner/certs/mydomain.com.crt
clusterWideAccess: true
serviceAccountName: gitlab
rbac:
  create: true
  rules:
    - resources: ["pods", "secrets"]
      verbs: ["get", "list", "watch", "create", "patch", "delete"]
    - apiGroups: [""]
      resources: ["pods/exec", "configmaps", "pods/attach", "secrets"]
      verbs: ["create", "patch", "delete", "update"]
nodeSelector:
  kubernetes.io/os: "linux"
runners:
  environment: ["FF_USE_POWERSHELL_PATH_RESOLVER=1"]
  nodeSelector:
    kubernetes.io/os: "windows"
    node.kubernetes.io/windows-build: "10.0.17763"
    kubernetes.io/arch: "amd64"
  1. In gitlab label the new runner as 'windows'. (you might also need to increase the timeout, pulling windows images tends to exceed the default timeout)
  2. Create a repository. Add the following ci file:
image: mcr.microsoft.com/dotnet/sdk:5.0-windowsservercore-ltsc2019

stages:
    - hello

hello:
    tags:
        - windows
    stage: hello
    script:
        - "echo hello"
  1. run the pipeline. It will spin up the images but then hang until the timeout

Actual behavior

The pipeline starts a new pod with 2 containers (helper and build). The job will register the containers and some random white spacing is dropped in the pipeline log. After that the build freezes until it gets killed by a timeout (default 1h).

logs for pipeline and kubectl logs attached below

There is no log output for the build container

Expected behavior

pipeline starts, echo's hello on a windows image, pipeline succeeds

Relevant logs and/or screenshots

pipeline log
Running with gitlab-runner 14.2.0 (58ba2b95)
  on gitlab-runner-windows-gitlab-runner-64fcbbdf57-xdnkt USaiYpXT
  feature flags: FF_USE_POWERSHELL_PATH_RESOLVER:true
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: gitlab
Using Kubernetes executor with image mcr.microsoft.com/dotnet/sdk:5.0-windowsservercore-ltsc2019 ...
Using attach strategy to execute scripts...
Preparing environment
Waiting for pod gitlab/runner-usaiypxt-project-571-concurrent-0c6htl to be running, status is Pending
	ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
	ContainersNotReady: "containers with unready status: [build helper]"
	ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-usaiypxt-project-571-concurrent-0c6htl to be running, status is Pending
	ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
	ContainersNotReady: "containers with unready status: [build helper]"
	ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-usaiypxt-project-571-concurrent-0c6htl to be running, status is Pending
	ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
	ContainersNotReady: "containers with unready status: [build helper]"
	ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-usaiypxt-project-571-concurrent-0c6htl to be running, status is Pending
	ContainersNotInitialized: "containers with incomplete status: [init-permissions]"
	ContainersNotReady: "containers with unready status: [build helper]"
	ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-usaiypxt-project-571-concurrent-0c6htl to be running, status is Pending
	ContainersNotReady: "containers with unready status: [build helper]"
	ContainersNotReady: "containers with unready status: [build helper]"
Waiting for pod gitlab/runner-usaiypxt-project-571-concurrent-0c6htl to be running, status is Pending
	ContainersNotReady: "containers with unready status: [build helper]"
	ContainersNotReady: "containers with unready status: [build helper]"





ERROR: Job failed: execution took longer than 1h0m0s seconds

the empty lines with just a \t on them are not a typo, they get written out once the containers are both running

helper container logs
PS C:\git\gitlabrunnerconfig> kubectl logs runner-usaiypxt-project-571-concurrent-0b7n8l --namespace=gitlab -c helper
Running on RUNNER-USAIYPXT via
gitlab-runner-windows-gitlab-runner-64fcbbdf57-xdnkt...
{"command_exit_code": 0, "script": "runner_script_trap"}

Environment description

  • gitlab version: 14.2.3
  • runner version: 14.2.0
  • cluster info:
NAME              STATUS   ROLES                  AGE     VERSION
dev-builder       Ready    control-plane,master   33d     v1.21.0
dev-builder-win   Ready    <none>                 4d19h   v1.21.1

There is also a second Gitlab runner active on the cluster to targets Linux. Those builds have no problems

Edited by Nick Otten