gitlab runner failed health check
On June 2nd, our gitlab runner in all region got automatically upgraded from 13.11.0 to 14.11.0 The new runner image kept failing helath check. So we rolled back manually. It got automatically upgraded again today.
kirbyr@MACC02F402FML85 iac % helm history gitlab-runner -n gitlab REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION 2 Sat Mar 27 00:20:23 2021 superseded gitlab-runner-0.25.0 13.8.0 Install complete 3 Sun Apr 18 03:52:34 2021 superseded gitlab-runner-0.25.0 13.8.0 Install complete 4 Thu Apr 22 17:57:08 2021 superseded gitlab-runner-0.25.0 13.8.0 Install complete 5 Tue Apr 27 16:08:17 2021 superseded gitlab-runner-0.25.0 13.8.0 Upgrade complete 6 Fri Apr 30 23:39:31 2021 superseded gitlab-runner-0.28.0 13.11.0 Install complete 7 Wed May 5 23:34:52 2021 superseded gitlab-runner-0.28.0 13.11.0 Install complete 8 Tue Mar 8 04:49:28 2022 superseded gitlab-runner-0.28.0 13.11.0 Install complete 9 Thu Jun 2 09:54:38 2022 superseded gitlab-runner-0.40.0 14.10.0 Upgrade complete 10 Thu Jun 2 08:29:01 2022 superseded gitlab-runner-0.28.0 13.11.0 Rollback to 8 11 Mon Jun 6 09:17:11 2022 deployed gitlab-runner-0.40.0 14.10.0 Upgrade complete
`kirbyr@MACC02F402FML85 iac % kubectl describe po gitlab-runner-gitlab-runner-7955b7cc74-n9ptv -n gitlab
Name: gitlab-runner-gitlab-runner-7955b7cc74-n9ptv
Namespace: gitlab
Priority: 0
Node: ip-10-0-72-107.us-west-1.compute.internal/10.0.72.107
Start Time: Mon, 06 Jun 2022 02:17:13 -0700
Labels: app=gitlab-runner-gitlab-runner
chart=gitlab-runner-0.40.0
heritage=Helm
pod-template-hash=7955b7cc74
release=gitlab-runner
Annotations: checksum/configmap: 144348e8b3b166c10ae83e4e9f4574d85067fd6af9451c3b10ec4801e203f12d
checksum/secrets: 91169a65cc51540c7dd4bbef356585af741f88e68ee69c7b682d592b975507e4
kubernetes.io/psp: eks.privileged
Status: Running
IP: 10.2.8.68
IPs:
IP: 10.2.8.68
Controlled By: ReplicaSet/gitlab-runner-gitlab-runner-7955b7cc74
Init Containers:
configure:
Container ID: docker://cea378d9aebb75c9277f6f9613211c89389e364302e53419a45d496c5bc7b231
Image: gitlab/gitlab-runner:alpine-v14.10.0
Image ID: docker-pullable://gitlab/gitlab-runner@sha256:a78875f60bd23773b8d563b2565edeb34ba4be163dfd232a5d298dbc574f3654
Port:
Host Port:
Command:
sh
/configmaps/configure
State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 06 Jun 2022 02:17:21 -0700
Finished: Mon, 06 Jun 2022 02:17:21 -0700
Ready: True
Restart Count: 0
Environment:
CI_SERVER_URL: https://gitlab.com/
CLONE_URL:
RUNNER_EXECUTOR: kubernetes
REGISTER_LOCKED: true
RUNNER_TAG_LIST: comfy-packer-prod-aws-uswest1
AWS_DEFAULT_REGION: us-west-1
AWS_REGION: us-west-1
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/configmaps from configmaps (ro)
/init-secrets from init-runner-secrets (ro)
/secrets from runner-secrets (rw)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from gitlab-runner-gitlab-runner-token-q95rg (ro)
Containers:
gitlab-runner-gitlab-runner:
Container ID: docker://b0dcc5e178810e9b1f6e1e144a81b36ab387e1599672b9b3d7208438b76ee9db
Image: gitlab/gitlab-runner:alpine-v14.10.0
Image ID: docker-pullable://gitlab/gitlab-runner@sha256:a78875f60bd23773b8d563b2565edeb34ba4be163dfd232a5d298dbc574f3654
Port: 9252/TCP
Host Port: 0/TCP
Command:
/usr/bin/dumb-init
--
/bin/bash
/configmaps/entrypoint
State: Running
Started: Mon, 06 Jun 2022 15:13:33 -0700
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Mon, 06 Jun 2022 15:05:47 -0700
Finished: Mon, 06 Jun 2022 15:08:23 -0700
Ready: True
Restart Count: 104
Liveness: exec [/bin/bash /configmaps/check-live] delay=60s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/usr/bin/pgrep gitlab.*runner] delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
CI_SERVER_URL: https://gitlab.com/
CLONE_URL:
RUNNER_EXECUTOR: kubernetes
REGISTER_LOCKED: true
RUNNER_TAG_LIST: comfy-packer-prod-aws-uswest1
AWS_DEFAULT_REGION: us-west-1
AWS_REGION: us-west-1
AWS_ROLE_ARN: arn:aws:iam::153524503100:role/dev-us-west-2-gitlab-runner-packer
AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token
Mounts:
/configmaps from configmaps (rw)
/home/gitlab-runner/.gitlab-runner from etc-gitlab-runner (rw)
/secrets from runner-secrets (rw)
/var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from gitlab-runner-gitlab-runner-token-q95rg (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
aws-iam-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 86400
runner-secrets:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit:
etc-gitlab-runner:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit:
init-runner-secrets:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: gitlab-runner-gitlab-runner
SecretOptionalName:
configmaps:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: gitlab-runner-gitlab-runner
Optional: false
gitlab-runner-gitlab-runner-token-q95rg:
Type: Secret (a volume populated by a Secret)
SecretName: gitlab-runner-gitlab-runner-token-q95rg
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
Normal Pulled 47m (x99 over 12h) kubelet Container image "gitlab/gitlab-runner:alpine-v14.10.0" already present on machine Warning Unhealthy 7m41s (x1423 over 12h) kubelet Readiness probe failed: Warning BackOff 2m40s (x2386 over 12h) kubelet Back-off restarting failed container `
gitlab-runner:alpine-v14.10.0 image that's pulled from gitlab is continuing to crash when it runs health check. It runs the helm chart 0.40.0