gitlab runner failed health check

On June 2nd, our gitlab runner in all region got automatically upgraded from 13.11.0 to 14.11.0 The new runner image kept failing helath check. So we rolled back manually. It got automatically upgraded again today.

kirbyr@MACC02F402FML85 iac % helm history gitlab-runner -n gitlab REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION 2 Sat Mar 27 00:20:23 2021 superseded gitlab-runner-0.25.0 13.8.0 Install complete 3 Sun Apr 18 03:52:34 2021 superseded gitlab-runner-0.25.0 13.8.0 Install complete 4 Thu Apr 22 17:57:08 2021 superseded gitlab-runner-0.25.0 13.8.0 Install complete 5 Tue Apr 27 16:08:17 2021 superseded gitlab-runner-0.25.0 13.8.0 Upgrade complete 6 Fri Apr 30 23:39:31 2021 superseded gitlab-runner-0.28.0 13.11.0 Install complete 7 Wed May 5 23:34:52 2021 superseded gitlab-runner-0.28.0 13.11.0 Install complete 8 Tue Mar 8 04:49:28 2022 superseded gitlab-runner-0.28.0 13.11.0 Install complete 9 Thu Jun 2 09:54:38 2022 superseded gitlab-runner-0.40.0 14.10.0 Upgrade complete 10 Thu Jun 2 08:29:01 2022 superseded gitlab-runner-0.28.0 13.11.0 Rollback to 8 11 Mon Jun 6 09:17:11 2022 deployed gitlab-runner-0.40.0 14.10.0 Upgrade complete

`kirbyr@MACC02F402FML85 iac % kubectl describe po gitlab-runner-gitlab-runner-7955b7cc74-n9ptv -n gitlab Name: gitlab-runner-gitlab-runner-7955b7cc74-n9ptv Namespace: gitlab Priority: 0 Node: ip-10-0-72-107.us-west-1.compute.internal/10.0.72.107 Start Time: Mon, 06 Jun 2022 02:17:13 -0700 Labels: app=gitlab-runner-gitlab-runner chart=gitlab-runner-0.40.0 heritage=Helm pod-template-hash=7955b7cc74 release=gitlab-runner Annotations: checksum/configmap: 144348e8b3b166c10ae83e4e9f4574d85067fd6af9451c3b10ec4801e203f12d checksum/secrets: 91169a65cc51540c7dd4bbef356585af741f88e68ee69c7b682d592b975507e4 kubernetes.io/psp: eks.privileged Status: Running IP: 10.2.8.68 IPs: IP: 10.2.8.68 Controlled By: ReplicaSet/gitlab-runner-gitlab-runner-7955b7cc74 Init Containers: configure: Container ID: docker://cea378d9aebb75c9277f6f9613211c89389e364302e53419a45d496c5bc7b231 Image: gitlab/gitlab-runner:alpine-v14.10.0 Image ID: docker-pullable://gitlab/gitlab-runner@sha256:a78875f60bd23773b8d563b2565edeb34ba4be163dfd232a5d298dbc574f3654 Port: Host Port: Command: sh /configmaps/configure State: Terminated Reason: Completed Exit Code: 0 Started: Mon, 06 Jun 2022 02:17:21 -0700 Finished: Mon, 06 Jun 2022 02:17:21 -0700 Ready: True Restart Count: 0 Environment: CI_SERVER_URL: https://gitlab.com/ CLONE_URL:
RUNNER_EXECUTOR: kubernetes REGISTER_LOCKED: true RUNNER_TAG_LIST: comfy-packer-prod-aws-uswest1 AWS_DEFAULT_REGION: us-west-1 AWS_REGION: us-west-1 AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token Mounts: /configmaps from configmaps (ro) /init-secrets from init-runner-secrets (ro) /secrets from runner-secrets (rw) /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro) /var/run/secrets/kubernetes.io/serviceaccount from gitlab-runner-gitlab-runner-token-q95rg (ro) Containers: gitlab-runner-gitlab-runner: Container ID: docker://b0dcc5e178810e9b1f6e1e144a81b36ab387e1599672b9b3d7208438b76ee9db Image: gitlab/gitlab-runner:alpine-v14.10.0 Image ID: docker-pullable://gitlab/gitlab-runner@sha256:a78875f60bd23773b8d563b2565edeb34ba4be163dfd232a5d298dbc574f3654 Port: 9252/TCP Host Port: 0/TCP Command: /usr/bin/dumb-init -- /bin/bash /configmaps/entrypoint State: Running Started: Mon, 06 Jun 2022 15:13:33 -0700 Last State: Terminated Reason: Error Exit Code: 1 Started: Mon, 06 Jun 2022 15:05:47 -0700 Finished: Mon, 06 Jun 2022 15:08:23 -0700 Ready: True Restart Count: 104 Liveness: exec [/bin/bash /configmaps/check-live] delay=60s timeout=1s period=10s #success=1 #failure=3 Readiness: exec [/usr/bin/pgrep gitlab.*runner] delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: CI_SERVER_URL: https://gitlab.com/ CLONE_URL:
RUNNER_EXECUTOR: kubernetes REGISTER_LOCKED: true RUNNER_TAG_LIST: comfy-packer-prod-aws-uswest1 AWS_DEFAULT_REGION: us-west-1 AWS_REGION: us-west-1 AWS_ROLE_ARN: arn:aws:iam::153524503100:role/dev-us-west-2-gitlab-runner-packer AWS_WEB_IDENTITY_TOKEN_FILE: /var/run/secrets/eks.amazonaws.com/serviceaccount/token Mounts: /configmaps from configmaps (rw) /home/gitlab-runner/.gitlab-runner from etc-gitlab-runner (rw) /secrets from runner-secrets (rw) /var/run/secrets/eks.amazonaws.com/serviceaccount from aws-iam-token (ro) /var/run/secrets/kubernetes.io/serviceaccount from gitlab-runner-gitlab-runner-token-q95rg (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: aws-iam-token: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 86400 runner-secrets: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: etc-gitlab-runner: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: init-runner-secrets: Type: Projected (a volume that contains injected data from multiple sources) SecretName: gitlab-runner-gitlab-runner SecretOptionalName: configmaps: Type: ConfigMap (a volume populated by a ConfigMap) Name: gitlab-runner-gitlab-runner Optional: false gitlab-runner-gitlab-runner-token-q95rg: Type: Secret (a volume populated by a Secret) SecretName: gitlab-runner-gitlab-runner-token-q95rg Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message


Normal Pulled 47m (x99 over 12h) kubelet Container image "gitlab/gitlab-runner:alpine-v14.10.0" already present on machine Warning Unhealthy 7m41s (x1423 over 12h) kubelet Readiness probe failed: Warning BackOff 2m40s (x2386 over 12h) kubelet Back-off restarting failed container `

gitlab-runner:alpine-v14.10.0 image that's pulled from gitlab is continuing to crash when it runs health check. It runs the helm chart 0.40.0