Cannot run Gitlab-runner on Azure Virtual Node (ACI) : "Pod already succeeded before it begins running"
Summary
When using ACI as Azure Virtual Node, my jobs are failing as they start with following message "Pod already succeeded before it begins running"
Steps to reproduce
- Follow the steps on Azure documentation to enable Virtual Nodes: https://docs.microsoft.com/fr-fr/azure/aks/virtual-nodes-cli
- Configure your gitlab-runners to use the virtual nodes (create a pull secret and add relevant node_selectors and tolerations)
- Launch a job (any job) from gitlab and wait: the pod is scheduled on the node, seems to pull correctly, and go from Pending to Waiting to Terminated
- The jobs fails with message: "Job failed (system failure): prepare environment: pod already succeeded before it begins running. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information"
NB:
- The jobs are working fine on "normal" nodes (VirtualScaleSet)
- If I run a stand-alone pod (like a webserver, launched via kubectl and not via the runner) on the VirtualNode, it works fine
I can be available to arrange a call and a shared-screen if you do not have an Azure environment!
Actual behavior
The job is failing and I cannot understand what is happening exactly
Expected behavior
The job should run correctly on Azure ACI
Relevant logs and/or screenshots
Environment description
config.toml contents
concurrent = 48
log_format = "json"
check_interval = 3
listen_address = "0.0.0.0:9252"
[session_server]
session_timeout = 1800
[[runners]]
request_concurrency = 20
name = "aks-runner-prd"
url = "https://xxxxx.com/"
token = "XXXXXX"
executor = "kubernetes"
environment = ["K8S_AUTH_KUBECONFIG=/home/ops/.kube/config"]
[runners.custom_build_dir]
[runners.cache]
Type = "azure"
[runners.cache.azure]
StorageDomain = "blob.core.windows.net"
AccountName = "XXXXX"
AccountKey = "XXXXX"
ContainerName = "gitlab-cache"
[runners.kubernetes]
namespace = "gitlab-jobs"
namespace_overwrite_allowed = ""
cpu_request = "200m" # https://docs.gitlab.com/runner/executors/kubernetes.html#overwriting-build-resources
cpu_request_overwrite_max_allowed = "2"
memory_request = "200Mi"
memory_request_overwrite_max_allowed = "4Gi"
privileged = false
poll_timeout = 360
host = ""
bearer_token_overwrite_allowed = false
image = ""
pull_policy = "always"
service_account = ""
service_account_overwrite_allowed = ""
pod_annotations_overwrite_allowed = ""
image_pull_secrets = [ "acrpullsecret" ]
[runners.kubernetes.pod_security_context]
[runners.kubernetes.volumes]
[runners.kubernetes.node_selector]
"kubernetes.io/role" = "agent"
"beta.kubernetes.io/os" = "linux"
"type" = "virtual-kubelet"
[runners.kubernetes.node_tolerations]
"virtual-kubelet.io/provider" = "NoSchedule"
"node.kubernetes.io/unreachable" = "NoExecute"
Used GitLab Runner version
Running with gitlab/gitlab-runner:v13.4.1
Using Kubernetes executor
Gitlab 13.4.3-ce.0
AKS 1.18.8
Edited by vmignot
