Report deleted pods as a system failure with attach strategy (!2444) · Merge requests · GitLab.org / gitlab-runner

Georgi N. Georgiev requested to merge report_deleted_pod_as_system_failure into master Oct 01, 2020

What does this MR do?

Since when using the attach strategy we monitor the status of the pods at all times we can properly mark it as a system failure. This allows for when e.g. using spot instances to retry the job.

For context: #26856 (comment 410583524)

Why was this MR needed?

Otherwise the reported error was a script failure which should not be the case.

What's the best way to test this MR?

Automated

Run the integration tests:

go test -v -run 'TestDeletedPodSystemFailureDuringExecution' ./executors/kubernetes

Or manually

Start a long-running job, e.g.:

sleep:
    script:
      - sleep 5000
    tags:
      - k8s

Get the pod from the job logs and delete it:

kubectl delete pod runner-l8gav8fn-project-15339497-concurrent-0jp6jq

The job should report it as a system failure:

ERROR: Job failed (system failure): pods "runner-l8gav8fn-project-15339497-concurrent-0jp6jq" not found

What are the relevant issue numbers?

Closes #26856 (closed)

Edited Jan 26, 2021 by Georgi N. Georgiev

Report deleted pods as a system failure with attach strategy