Kubernetes GitLab runner: After upgrading k8s, jobs started failing.
It is machine-translated into English.
Please forgive any possible misrepresentation.
Summary
Upgraded k8s from 1.31.6 to 1.31.7.
After upgrading to 1.31.7, jobs running in gitlab-runner fail with the following error.
job log
Running with gitlab-runner 17.9.1 (bbf75488)
on b3ci01-gitlab-runner-5bd48bf958-5rc5f 1hpHfz3M5, system ID: r_GU3Gog10EhKv
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: b3ci
Using Kubernetes executor with image 10.10.87.240/library/citest-runner-base:20250109 ...
Using attach strategy to execute scripts...
Preparing environment
00:05
Using FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 30m0s...
Waiting for pod b3ci/runner-1hphfz3m5-project-8-concurrent-7-duh21aye to be running, status is Pending
ERROR: Job failed (system failure): prepare environment: setting up trapping scripts on emptyDir: unable to upgrade connection: <html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx/1.27.3</center>
</body>
</html>. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
Steps to reproduce
Happens time to time on Kubernetes runner. After restart it is working.
Environment description
gitlab Kubernetes runner: 17.9.1
config.toml
config.template.toml: |
[[runners]]
[runners.kubernetes]
namespace = "b3ci"
image = "alpine"
pull_policy = "if-not-present"
allowed_pull_policies = ["always", "if-not-present"]
cpu_request = "1600m"
memory_request = "3Gi"
poll_interval = 5
poll_timeout = 3600
[runners.kubernetes.node_tolerations]
"dedicated=b3ci" = "NoExecute"
[runners.kubernetes.affinity]
[runners.kubernetes.affinity.node_affinity]
[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution]
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms]]
[[runners.kubernetes.affinity.node_affinity.required_during_scheduling_ignored_during_execution.node_selector_terms.match_expressions]]
key = "nodetype"
operator = "In"
values = [
"b3ci"
]
[runners.kubernetes.pod_annotations]
"pod-cleanup.gitlab.com/ttl" = "2h"
config.toml: |
shutdown_timeout = 0
concurrent = 38
check_interval = 30
log_level = "info"
listen_address = ":9252"
If there is some work-around, I would appreciate it if you could let me know.
Is there any other information you need?