Terminate job and display error when services are oom killed on the kubernetes executor

  • Please check this box if this contribution uses AI-generated content (including content generated by GitLab Duo features) as outlined in the GitLab DCO & CLA

What does this MR do?

Resolves #27266 (closed) by checking if containers in the job pod have OOM'd during runWithAttach. The reason that the job continues when the build container ooms is that the helper container stays alive thus the pod remains running.

Alternative solution to this: configure job pod to fail if build container ooms.

Note: I'm new to this codebase and am unaware of the reasons behind many decisions in the k8s executor. This is a starting point to build off of, certainly not the finished product.

Why was this MR needed?

Currently, when the build container (potentially also helper container, untested) ooms on the k8s executor, the runner does not terminate the job, so the job continues to run without progress until timeout.

What's the best way to test this MR?

  • OOM a build job, make sure it terminates
  • OOM a helper job, make sure it terminates (if desired)

What are the relevant issue numbers?

close #27266 (closed) #38244 (closed)

Merge request reports

Loading