Skip to content

PWSH Executor fails to terminate on job failure

Summary

Customer reported, on zd-244015, that Kubernetes runners v14.x, using pwsh shell, failed to terminate when job failed with exit code 1. The pipeline job would just stay running until pipeline completed.

Steps to reproduce

Kubernetes runners version>14.0 using pwsh shell

.gitlab-ci.yml
test1:
  stage: Test1
  script:
    - Write-Error "Job hits error but hangs until job duration timeout" -ErrorAction Stop
  timeout: 2m
  tags:
   - linux-pwsh

Actual behavior

When pipeline failed produce an error, runner failed to terminate and will stay waiting, running, until pipeline timeout is reached.

Expected behavior

Runner should terminate as soon as there is an error in the pipeline.

Relevant logs and/or screenshots

job log
$ Write-Error "Job hits error but hangs until job duration timeout" -ErrorAction Stop
Cleaning up project directory and file based variables
00:01
Write-Error:
Line |
237 | Write-Error "Job hits error but hangs until job duration timeout" -Er
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| Job hits error but hangs until job duration timeout
ERROR: Job failed: command terminated with exit code 1
runner log
Checking for jobs... received [0;m job[0;m=52324667 repo_url[0;m=https://gitlab.dell.com/Bhaskar_Todi/tests.git runner[0;m=RnEGW6Co

Checking for jobs... received [0;m job[0;m=52324668 repo_url[0;m=https://gitlab.dell.com/Bhaskar_Todi/tests.git runner[0;m=RnEGW6Co

Checking for jobs... received [0;m job[0;m=52324669 repo_url[0;m=https://gitlab.dell.com/Bhaskar_Todi/tests.git runner[0;m=RnEGW6Co

[0;33mWARNING: Job failed: command terminated with exit code 1

[0;m [0;33mduration_s[0;m=60.77199956 [0;33mjob[0;m=52324667 [0;33mproject[0;m=55659 [0;33mrunner[0;m=RnEGW6Co

[0;33mWARNING: Failed to process runner [0;m [0;33mbuilds[0;m=2 [0;33merror[0;m=command terminated with exit code 1 [0;33mexecutor[0;m=kubernetes [0;33mrunner[0;m=RnEGW6Co

[0;33mWARNING: Error while executing file based variables removal script[0;m [0;33merror[0;m=context canceled [0;33mjob[0;m=52324668 [0;33mproject[0;m=55659 [0;33mrunner[0;m=RnEGW6Co

[0;33mWARNING: Job failed: execution took longer than 2m0s seconds

[0;m [0;33mduration_s[0;m=120.012874996 [0;33mjob[0;m=52324668 [0;33mproject[0;m=55659 [0;33mrunner[0;m=RnEGW6Co

[0;33mWARNING: Error while executing file based variables removal script[0;m [0;33merror[0;m=context canceled [0;33mjob[0;m=52324669 [0;33mproject[0;m=55659 [0;33mrunner[0;m=RnEGW6Co

[0;33mWARNING: Job failed: execution took longer than 2m0s seconds

[0;m [0;33mduration_s[0;m=120.007875294 [0;33mjob[0;m=52324669 [0;33mproject[0;m=55659 [0;33mrunner[0;m=RnEGW6Co

[0;33mWARNING: Failed to process runner [0;m [0;33mbuilds[0;m=1 [0;33merror[0;m=execution took longer than 2m0s seconds [0;33mexecutor[0;m=kubernetes [0;33mrunner[0;m=RnEGW6Co

[0;33mWARNING: Failed to process runner [0;m [0;33mbuilds[0;m=0 [0;33merror[0;m=execution took longer than 2m0s seconds [0;33mexecutor[0;m=kubernetes [0;33mrunner[0;m=RnEGW6Co
Environment description
config.toml contents
listen_address = ":9252"
concurrent = 50
check_interval = 5
log_level = "info"

[session_server]
session_timeout = 1800

[[runners]]
name = "shared-pks-s1-pwsh-main-6775489-gitlab-runner-6df75d48f6-bm597"
request_concurrency = 2
url = "https://gitlab.dell.com"
token = "XXXXXX"
executor = "kubernetes"
shell = "pwsh"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.kubernetes]
host = ""
bearer_token_overwrite_allowed = false
image = "harbor.dell.com/devops-images/infrastructure-devops:tools_v4.7.0"
namespace = "glr-shared"
namespace_overwrite_allowed = ""
privileged = false
helper_image = "artifacts.dell.com/gitlab-registry/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-v14.3.0-pwsh"
service_account_overwrite_allowed = ""
pod_annotations_overwrite_allowed = ""
[runners.kubernetes.affinity]
[runners.kubernetes.pod_security_context]
[runners.kubernetes.volumes]
[runners.kubernetes.dns_config]
[runners.kubernetes.container_lifecycle]

Used GitLab Runner version

Running with gitlab-runner 14.0.1 (c1edb478)
Preparing the "kubernetes" executor

Possible fixes

Use feature flag FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY as mentioned on the docs

Edited by Gerardo Gutierrez