Resource Limitation Issue with Runner installed on Kubernetes probably due to PLEG/NodeNotReady problems
Overview
We have a number of customers encountering issues while using GitLab Kubernetes executor on different cloud providers (EKS & AKS have been reported).
One case shows that jobs intermittent fail after a couple (or sometimes 1) successful jobs, the job output of the failing ones usually have the logs below before timing out:
Waiting for pod default/runner-8yeawpmn-project-529-concurrent-02d7lz to be running, status is Pending
On Investigation of the underlying pods, kubelet logs, and others, there seems to be a problem somewhere that is triggered at a point in time based on load. See the logs below:
Kubelet logs:
NAMESPACE LAST SEEN TYPE REASON KIND MESSAGE
gitlab-runner-support 48s Normal Scheduled Pod Successfully assigned gitlab-runner-support/runner-kw4tq1u-project-1404-concurrent-08rsfk to aks-agentpool-25657597-0
gitlab-runner-support 47s Normal Pulled Pod Container image "node:latest" already present on machine
gitlab-runner-support 47s Normal Created Pod Created container
gitlab-runner-support 47s Normal Started Pod Started container
gitlab-runner-support 47s Normal Pulled Pod Container image "gitlab/gitlab-runner-helper:x86_64-de7731dd" already present on machine
gitlab-runner-support 46s Normal Created Pod Created container
gitlab-runner-support 46s Normal Started Pod Started container
NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE
gitlab-runner-support 0s Normal Scheduled pod/runner-kw4tq1u-project-1404-concurrent-05pd84 Successfully assigned gitlab-runner-support/runner-kw4tq1u-project-1404-concurrent-05pd84 to aks-agentpool-25657597-0
gitlab-runner-support 0s Normal Scheduled pod/runner-kw4tq1u-project-1404-concurrent-188xgc Successfully assigned gitlab-runner-support/runner-kw4tq1u-project-1404-concurrent-188xgc to aks-agentpool-25657597-0
gitlab-runner-support 0s Normal Scheduled pod/runner-kw4tq1u-project-1404-concurrent-2948kg Successfully assigned gitlab-runner-support/runner-kw4tq1u-project-1404-concurrent-2948kg to aks-agentpool-25657597-0
gitlab-runner-support 0s Normal Scheduled pod/runner-kw4tq1u-project-1404-concurrent-3zr5r7 Successfully assigned gitlab-runner-support/runner-kw4tq1u-project-1404-concurrent-3zr5r7 to aks-agentpool-25657597-0
gitlab-runner-support 0s Normal Scheduled pod/runner-kw4tq1u-project-1404-concurrent-44lkh6 Successfully assigned gitlab-runner-support/runner-kw4tq1u-project-1404-concurrent-44lkh6 to aks-agentpool-25657597-0
gitlab-runner-support 0s Normal Scheduled pod/runner-kw4tq1u-project-1404-concurrent-546d2g Successfully assigned gitlab-runner-support/runner-kw4tq1u-project-1404-concurrent-546d2g to aks-agentpool-25657597-0
gitlab-runner-support 0s Normal Killing pod/runner-kw4tq1u-project-1404-concurrent-08rsfk Killing container with id docker://build:Need to kill Pod
gitlab-runner-support 0s Normal Killing pod/runner-kw4tq1u-project-1404-concurrent-08rsfk Killing container with id docker://build:Need to kill Pod
gitlab-runner-support 0s Normal Killing pod/runner-kw4tq1u-project-1404-concurrent-08rsfk Killing container with id docker://helper:Need to kill Pod
gitlab-runner-support 0s Normal Killing pod/runner-kw4tq1u-project-1404-concurrent-08rsfk Killing container with id docker://helper:Need to kill Pod
gitlab-runner-support 0s Normal Pulled pod/runner-kw4tq1u-project-1404-concurrent-05pd84 Container image "node:latest" already present on machine
gitlab-runner-support 0s Warning FailedKillPod pod/runner-kw4tq1u-project-1404-concurrent-08rsfk error killing pod: failed to "KillPodSandbox" for "97a7ed0f-c4cd-11e9-9f63-e27977b905b8" with KillPodSandboxError: "rpc error: code = DeadlineExceeded desc = context deadline exceeded"
gitlab-runner-support 0s Warning FailedCreatePodSandBox pod/runner-kw4tq1u-project-1404-concurrent-188xgc Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "runner-kw4tq1u-project-1404-concurrent-188xgc": operation timeout: context deadline exceeded
gitlab-runner-support 0s Warning FailedCreatePodSandBox pod/runner-kw4tq1u-project-1404-concurrent-2948kg Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "runner-kw4tq1u-project-1404-concurrent-2948kg": operation timeout: context deadline exceeded
gitlab-runner-support 0s Warning FailedCreatePodSandBox pod/runner-kw4tq1u-project-1404-concurrent-3zr5r7 Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "runner-kw4tq1u-project-1404-concurrent-3zr5r7": operation timeout: context deadline exceeded
gitlab-runner-support 0s Warning FailedCreatePodSandBox pod/runner-kw4tq1u-project-1404-concurrent-44lkh6 Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "runner-kw4tq1u-project-1404-concurrent-44lkh6": operation timeout: context deadline exceeded
gitlab-runner-support 0s Warning FailedCreatePodSandBox pod/runner-kw4tq1u-project-1404-concurrent-546d2g Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "runner-kw4tq1u-project-1404-concurrent-546d2g": operation timeout: context deadline exceeded
default 1s Normal NodeNotReady node/aks-agentpool-25657597-0 Node aks-agentpool-25657597-0 status is now: NodeNotReady
Another Set of kubelet logs:
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:46.286521 1975 remote_runtime.go:282] ContainerStatus "c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:46.286539 1975 kuberuntime_container.go:397] ContainerStatus for c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1 error: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:46.286545 1975 kuberuntime_manager.go:875] getPodContainerStatuses for pod "runner-kw4tq1u-project-1404-concurrent-2r4spr_gitlab-runner-support(2d9c3321-c4d0-11e9-9f63-e27977b905b8)" failed: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:46.286555 1975 generic.go:247] PLEG: Ignoring events for pod runner-kw4tq1u-project-1404-concurrent-2r4spr/gitlab-runner-support: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:46.290505 1975 remote_runtime.go:282] ContainerStatus "c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:46.290522 1975 kuberuntime_container.go:397] ContainerStatus for c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1 error: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:46.290531 1975 kuberuntime_manager.go:875] getPodContainerStatuses for pod "runner-kw4tq1u-project-1404-concurrent-2r4spr_gitlab-runner-support(2d9c3321-c4d0-11e9-9f63-e27977b905b8)" failed: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:46.290543 1975 generic.go:277] PLEG: pod runner-kw4tq1u-project-1404-concurrent-2r4spr/gitlab-runner-support failed reinspection: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:46.294577 1975 remote_runtime.go:282] ContainerStatus "ffd0dd52b276d79eca4a45244da644f47da10a465e53ba2dfd47efc9b5630e5d" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: ffd0dd52b276d79eca4a45244da644f47da10a465e53ba2dfd47efc9b5630e5d
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:46.294604 1975 kuberuntime_container.go:397] ContainerStatus for ffd0dd52b276d79eca4a45244da644f47da10a465e53ba2dfd47efc9b5630e5d error: rpc error: code = Unknown desc = Error: No such container: ffd0dd52b276d79eca4a45244da644f47da10a465e53ba2dfd47efc9b5630e5d
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:46.294610 1975 kuberuntime_manager.go:875] getPodContainerStatuses for pod "runner-kw4tq1u-project-1404-concurrent-1k8d22_gitlab-runner-support(2d799e11-c4d0-11e9-9f63-e27977b905b8)" failed: rpc error: code = Unknown desc = Error: No such container: ffd0dd52b276d79eca4a45244da644f47da10a465e53ba2dfd47efc9b5630e5d
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:46.294621 1975 generic.go:277] PLEG: pod runner-kw4tq1u-project-1404-concurrent-1k8d22/gitlab-runner-support failed reinspection: rpc error: code = Unknown desc = Error: No such container: ffd0dd52b276d79eca4a45244da644f47da10a465e53ba2dfd47efc9b5630e5d
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:47.305651 1975 remote_runtime.go:282] ContainerStatus "ffd0dd52b276d79eca4a45244da644f47da10a465e53ba2dfd47efc9b5630e5d" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: ffd0dd52b276d79eca4a45244da644f47da10a465e53ba2dfd47efc9b5630e5d
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:47.305680 1975 kuberuntime_container.go:397] ContainerStatus for ffd0dd52b276d79eca4a45244da644f47da10a465e53ba2dfd47efc9b5630e5d error: rpc error: code = Unknown desc = Error: No such container: ffd0dd52b276d79eca4a45244da644f47da10a465e53ba2dfd47efc9b5630e5d
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:47.305692 1975 kuberuntime_manager.go:875] getPodContainerStatuses for pod "runner-kw4tq1u-project-1404-concurrent-1k8d22_gitlab-runner-support(2d799e11-c4d0-11e9-9f63-e27977b905b8)" failed: rpc error: code = Unknown desc = Error: No such container: ffd0dd52b276d79eca4a45244da644f47da10a465e53ba2dfd47efc9b5630e5d
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:47.305703 1975 generic.go:247] PLEG: Ignoring events for pod runner-kw4tq1u-project-1404-concurrent-1k8d22/gitlab-runner-support: rpc error: code = Unknown desc = Error: No such container: ffd0dd52b276d79eca4a45244da644f47da10a465e53ba2dfd47efc9b5630e5d
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:47.309854 1975 remote_runtime.go:282] ContainerStatus "c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:47.309872 1975 kuberuntime_container.go:397] ContainerStatus for c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1 error: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:47.309877 1975 kuberuntime_manager.go:875] getPodContainerStatuses for pod "runner-kw4tq1u-project-1404-concurrent-2r4spr_gitlab-runner-support(2d9c3321-c4d0-11e9-9f63-e27977b905b8)" failed: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:47.309890 1975 generic.go:247] PLEG: Ignoring events for pod runner-kw4tq1u-project-1404-concurrent-2r4spr/gitlab-runner-support: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:47.313962 1975 remote_runtime.go:282] ContainerStatus "c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:47.313982 1975 kuberuntime_container.go:397] ContainerStatus for c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1 error: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
Aug 22 11:35:14 aks-agentpool-25657597-0 kubelet[1975]: E0822 11:34:47.313988 1975 kuberuntime_manager.go:875] getPodContainerStatuses for pod "runner-kw4tq1u-project-1404-concurrent-2r4spr_gitlab-runner-support(2d9c3321-c4d0-11e9-9f63-e27977b905b8)" failed: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
It is also worth noting that while on a call with one of the customers, we tried running a pipeline with the first job's stage running successfully, while all the jobs on the second stage failing and the node went NodeNotReady
before reverting back when all the pods were terminated. See the description of one of the job pods below:
Name: runner-kw4tq1u-project-1404-concurrent-2r4spr
Namespace: gitlab-runner-support
Priority: 0
Node: aks-agentpool-25657597-0/10.115.21.66
Start Time: Thu, 22 Aug 2019 13:30:10 +0200
Labels: pod=runner-kw4tq1u-project-1404-concurrent-2
Annotations: <none>
Status: Pending
IP:
Containers:
build:
Container ID:
Image: node:latest
Image ID:
Port: <none>
Host Port: <none>
Command:
sh
-c
if [ -x /usr/local/bin/bash ]; then
exec /usr/local/bin/bash
elif [ -x /usr/bin/bash ]; then
exec /usr/bin/bash
elif [ -x /bin/bash ]; then
exec /bin/bash
elif [ -x /usr/local/bin/sh ]; then
exec /usr/local/bin/sh
elif [ -x /usr/bin/sh ]; then
exec /usr/bin/sh
elif [ -x /bin/sh ]; then
exec /bin/sh
elif [ -x /busybox/sh ]; then
exec /busybox/sh
else
echo shell not found
exit 1
fi
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
FF_CMD_DISABLE_DELAYED_ERROR_LEVEL_EXPANSION: false
FF_USE_LEGACY_BUILDS_DIR_FOR_DOCKER: false
FF_USE_LEGACY_VOLUMES_MOUNTING_ORDER: false
DOCKER_HOST: tcp://localhost:2375
DOCKER_TLS_CERTDIR:
CI_BUILDS_DIR: /builds
CI_PROJECT_DIR: /builds/application-development-platform/software-innovation-lab-frontend
CI_CONCURRENT_ID: 2
CI_CONCURRENT_PROJECT_ID: 2
CI_SERVER: yes
...
CI_RUNNER_VERSION: 12.1.0
CI_RUNNER_REVISION: de7731dd
CI_RUNNER_EXECUTABLE_ARCH: linux/amd64
Mounts:
/builds from repo (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-fx949 (ro)
helper:
Container ID:
Image: gitlab/gitlab-runner-helper:x86_64-de7731dd
Image ID:
Port: <none>
Host Port: <none>
Command:
sh
-c
if [ -x /usr/local/bin/bash ]; then
exec /usr/local/bin/bash
elif [ -x /usr/bin/bash ]; then
exec /usr/bin/bash
elif [ -x /bin/bash ]; then
exec /bin/bash
elif [ -x /usr/local/bin/sh ]; then
exec /usr/local/bin/sh
elif [ -x /usr/bin/sh ]; then
exec /usr/bin/sh
elif [ -x /bin/sh ]; then
exec /bin/sh
elif [ -x /busybox/sh ]; then
exec /busybox/sh
else
echo shell not found
exit 1
fi
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment:
FF_CMD_DISABLE_DELAYED_ERROR_LEVEL_EXPANSION: false
FF_USE_LEGACY_BUILDS_DIR_FOR_DOCKER: false
FF_USE_LEGACY_VOLUMES_MOUNTING_ORDER: false
DOCKER_HOST: tcp://localhost:2375
DOCKER_TLS_CERTDIR:
CI_BUILDS_DIR: /builds
CI_PROJECT_DIR: /builds/application-development-platform/software-innovation-lab-frontend
CI_CONCURRENT_ID: 2
CI_CONCURRENT_PROJECT_ID: 2
CI_SERVER: yes
....
Mounts:
/builds from repo (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-fx949 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
repo:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
default-token-fx949:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-fx949
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m21s default-scheduler Successfully assigned gitlab-runner-support/runner-kw4tq1u-project-1404-concurrent-2r4spr to aks-agentpool-25657597-0
Normal Pulled 5m9s kubelet, aks-agentpool-25657597-0 Container image "node:latest" already present on machine
Warning Failed 3m9s kubelet, aks-agentpool-25657597-0 Error: context deadline exceeded
Normal Pulled 3m9s kubelet, aks-agentpool-25657597-0 Container image "gitlab/gitlab-runner-helper:x86_64-de7731dd" already present on machine
Warning Failed 45s kubelet, aks-agentpool-25657597-0 Error: context deadline exceeded
Warning FailedSync 7s (x3 over 44s) kubelet, aks-agentpool-25657597-0 error determining status: rpc error: code = Unknown desc = Error: No such container: c23114be3edcad3def36bb596e44379d417f10d1b0dc2b84c3af6aa870ccdaf1
On seeing PLEG in the logs, @WarheadsSE encountered https://github.com/kubernetes/kubernetes/issues/45419#issuecomment-525669603, which seems to fix the handling of PLEG issues in Kubernetes 1.16. However, the main concern is how to effectively manage the resource limitations of the runner to manage these issues.
@WarheadsSE suggests configuring resource request and limits when using Helm Charts, but we also have customers who are deploying manually or using the GitLab Kubernetes Integration to deploy runners.
Customer Tickets (Internal):
- https://gitlab.zendesk.com/agent/tickets/130560
- https://gitlab.zendesk.com/agent/tickets/129854
- https://gitlab.zendesk.com/agent/tickets/129111
There are more logs and info in the tickets specific to each customer that I can't share here.
Root cause
The issue here is because GitLab Runner is telling Kubernetes to schedule a new Pod and waiting for that Pod to be running, but since the Kubernetes cluster is saturated from resources it cannot. GitLab Runner waits for 3 minutes by default for the pod to become available, but then it kills the job.
Workaround/Prevention
When something like this happens there are a few things that you can do to make the cluster/Jobs more resilient.
Set limits
GitLab Runner can set specific limits to the container it creates, using the settings specified below. You might be hesitant to add limits, but this will help with the stability of the cluster and GitLab Runner because you prevent issues where you have 1 Job consume 80% of the CPU on a node because of a mistake from the commit you are testing. Also if you job takes a few seconds/minutes longer because it's capped at a specific CPU level it would leave space in the cluster to run more jobs concurrently.
-
cpu_limit
: The CPU allocation given to build containers -
memory_limit
: The amount of memory allocated to build containers -
service_cpu_limit
: The CPU allocation given to build service containers -
service_memory_limit
: The amount of memory allocated to build service containers -
helper_cpu_limit
: The CPU allocation given to build helper containers -
helper_memory_limit
: The amount of memory allocated to build helper containers
There is no magic value for the limits about it depends on a lot of factors, for example: What are the script that the Job is running; How big is the repository; How heavy is the service, are you starting a small webserver, a large DB with a huge amount of data? These are all questions you should ask whilst set them up.
There are also kubernetes limits you can set, if you have a Kubernetes cluster and you are sharing it with other application, that are not GitLab Runner it might be worth investigating creating namespaces and setting appropriate limits.
You can also consider reducing the amount of concurrent of jobs GitLab Runner can run, or limit
Configure your kubernetes cluster to autoscale
Most managed Kubernetes service provides autoscaling to add node when there is resource saturation, you should look into enabling this.
Increase poll_timeout, poll_interval
- GitLab Runner provides poll_timeout, which is the amount of time, in seconds, that needs to pass before the runner will time out attempting to connect to the container it has just created. Useful for queueing more builds that the cluster can handle at a time (default = 180). You can try and bump this to up to 10miniutes or even longer.
- GitLab Runner provides poll_interval defines how frequently, in seconds, the runner will poll the Kubernetes pod it has just created to check its status (default = 3). If you make it every 10 seconds this can help a little bit with pressure on the kube API, if that is something you are seeing in your cluster using most resources.
Use Kubernetes scheduling policies
GitLab Runner supports node selectors & taints and tolerations which can help you schedule the right jobs on the right nodes for more efficient use of your cluster.
Action Items
- Create a new section
## Scale Kubernetes
in the kubernetes executor explaining the prevention/workaround above - Change the default poll_timeout to be a larger amount