Draft: Improve Kubernetes executor's pod ready detection (!4075) · Merge requests · GitLab.org / gitlab-runner

Arran Walker requested to merge ajwalker/fix-kubernetes-terminated-container into main May 04, 2023

What does this MR do?

Improves the pod ready detection and handles cases where the Pod is "ready", but actually has unready/terminated containers.

The error as to why the pod fails is now reported, rather than being silently ignored.

Why was this MR needed?

Fixes an issue where a pod is advertised as ready, despite the build container failing to even start/being terminated. I think there's a few cases where this can happen, but can easily occur for Windows if you specify "pwsh" as a shell, but use a job image that doesn't contain it.

What's the best way to test this MR?

On a Kubernetes cluster with Windows nodes, specify pwsh as the shell, but use a nanoserver image for the job (which doesn't include pwsh).

Before this MR, the error response is: ERROR: Job failed (system failure): prepare environment: unable to upgrade connection: 404 request not found..

After this MR, the error response is still rather cryptic, but is the response containerd/docker returns if you try to start a container with an entrypoint that doesn't exist.

What are the relevant issue numbers?

Closes #29103

Draft: Improve Kubernetes executor's pod ready detection

What does this MR do?

Why was this MR needed?

What's the best way to test this MR?

What are the relevant issue numbers?

Merge request reports