Cannot run Gitlab-runner on Azure Virtual Node (ACI) : "Pod already succeeded before it begins running"

Summary

When using ACI as Azure Virtual Node, my jobs are failing as they start with following message "Pod already succeeded before it begins running"

Steps to reproduce

  1. Follow the steps on Azure documentation to enable Virtual Nodes: https://docs.microsoft.com/fr-fr/azure/aks/virtual-nodes-cli
  2. Configure your gitlab-runners to use the virtual nodes (create a pull secret and add relevant node_selectors and tolerations)
  3. Launch a job (any job) from gitlab and wait: the pod is scheduled on the node, seems to pull correctly, and go from Pending to Waiting to Terminated
  4. The jobs fails with message: "Job failed (system failure): prepare environment: pod already succeeded before it begins running. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information"

NB:

  • The jobs are working fine on "normal" nodes (VirtualScaleSet)
  • If I run a stand-alone pod (like a webserver, launched via kubectl and not via the runner) on the VirtualNode, it works fine

I can be available to arrange a call and a shared-screen if you do not have an Azure environment!

Actual behavior

The job is failing and I cannot understand what is happening exactly

Expected behavior

The job should run correctly on Azure ACI

Relevant logs and/or screenshots

capture

Environment description

config.toml contents
concurrent = 48
log_format = "json"
check_interval = 3
listen_address = "0.0.0.0:9252"

[session_server]
  session_timeout = 1800

[[runners]]
  request_concurrency = 20
  name = "aks-runner-prd"
  url = "https://xxxxx.com/"
  token = "XXXXXX"
  executor = "kubernetes"
  environment = ["K8S_AUTH_KUBECONFIG=/home/ops/.kube/config"]
  [runners.custom_build_dir]
  [runners.cache]
    Type = "azure"
    [runners.cache.azure]
      StorageDomain = "blob.core.windows.net"
      AccountName = "XXXXX"
      AccountKey = "XXXXX"
      ContainerName = "gitlab-cache"
  [runners.kubernetes]
    namespace = "gitlab-jobs"
    namespace_overwrite_allowed = ""
    cpu_request = "200m"    # https://docs.gitlab.com/runner/executors/kubernetes.html#overwriting-build-resources
    cpu_request_overwrite_max_allowed = "2"
    memory_request = "200Mi"
    memory_request_overwrite_max_allowed = "4Gi"
    privileged = false
    poll_timeout = 360
    host = ""
    bearer_token_overwrite_allowed = false
    image = ""
    pull_policy = "always"
    service_account = ""
    service_account_overwrite_allowed = ""
    pod_annotations_overwrite_allowed = ""
    image_pull_secrets = [ "acrpullsecret" ]
    [runners.kubernetes.pod_security_context]
    [runners.kubernetes.volumes]
    [runners.kubernetes.node_selector]
      "kubernetes.io/role" = "agent"
      "beta.kubernetes.io/os" = "linux"
      "type" = "virtual-kubelet"
    [runners.kubernetes.node_tolerations]
      "virtual-kubelet.io/provider" = "NoSchedule"
      "node.kubernetes.io/unreachable" = "NoExecute"

Used GitLab Runner version

Running with gitlab/gitlab-runner:v13.4.1
Using Kubernetes executor
Gitlab 13.4.3-ce.0
AKS 1.18.8
Edited by vmignot