[k8s] Terminate PowerShell Script children processes when cancelling the job through UI
What does this MR do?
Based on user feedbacks gitlab#462181 (comment 2151012522), it looks like the work done in !4813 (merged) and !4980 (merged) were not sufficient as we still have use cases where the job won't get canceled (after cancelation through the UI).
After investigation, the issue seems to be related to child process for PowerShell.
In this MR, upon cancelation, the script terminates the stage process and all children processes associated.
Why was this MR needed?
To allow job cancelation through the UI.
What's the best way to test this MR?
gitlab-ci
variables:
FF_KUBERNETES_HONOR_ENTRYPOINT: "true"
FF_USE_POWERSHELL_PATH_RESOLVER: "true"
FF_RETRIEVE_POD_WARNING_EVENTS: "true"
FF_PRINT_POD_EVENTS: "true"
FF_SCRIPT_SECTIONS: "true"
CI_DEBUG_SERVICES: "true"
GIT_DEPTH: 5
simple-job:
image: mcr.microsoft.com/windows/servercore:ltsc2022
timeout: 1h
script:
- echo $MY_TEST_VARIABLE_1
- echo $MY_TEST_VARIABLE_2
- $PSVersionTable.PSVersion
- powershell.exe -C sleep 1800
config.toml
concurrent = 1
check_interval = 1
log_level = "info"
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = ""
url = "https://gitlab.com/"
id = 0
token = "glrt-REDACTED"
token_obtained_at = 2024-05-07T16:42:27Z
executor = "kubernetes"
shell = "powershell"
[runners.kubernetes]
host = ""
bearer_token_overwrite_allowed = false
image = "mcr.microsoft.com/windows/servercore:ltsc2022"
namespace = ""
namespace_overwrite_allowed = ""
node_selector_overwrite_allowed = ""
helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest-servercore21H2"
poll_timeout = 3600
pod_labels_overwrite_allowed = ""
service_account_overwrite_allowed = ""
pod_annotations_overwrite_allowed = ""
scripts_base_dir = "/dev"
[runners.kubernetes.node_selector]
"kubernetes.io/arch" = "amd64"
"kubernetes.io/os" = "windows"
"node.kubernetes.io/windows-build" = "10.0.20348"
[runners.kubernetes.pod_security_context]
[runners.kubernetes.volumes]
[runners.kubernetes.dns_config]
Cancel the job while running. The job should be canceled and the after_script should run as expected (job log):
job
Running with gitlab-runner development version (HEAD)
on REDACTED, system ID: s_b188029b2abb
feature flags: FF_USE_POWERSHELL_PATH_RESOLVER:true, FF_SCRIPT_SECTIONS:true, FF_KUBERNETES_HONOR_ENTRYPOINT:true, FF_PRINT_POD_EVENTS:true
Preparing the "kubernetes" executor
00:00
WARNING: Namespace is empty, therefore assuming 'default'.
Using Kubernetes namespace: default
Using Kubernetes executor with image mcr.microsoft.com/windows/servercore:ltsc2022 ...
Using attach strategy to execute scripts...
Preparing environment
00:47
Using FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 10m0s...
Subscribing to Kubernetes Pod events...
Type Reason Message
Normal Scheduled Successfully assigned default/runner-REDACTED-project-25452826-concurrent-0-wvlk7q46 to gke-ab2d29-xncn
Normal Pulled Container image "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest-servercore21H2" already present on machine
Normal Created Created container init-permissions
Normal Started Started container init-permissions
Normal Pulled Container image "mcr.microsoft.com/windows/servercore:ltsc2022" already present on machine
Normal Created Created container build
Normal Started Started container build
Normal Pulled Container image "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest-servercore21H2" already present on machine
Normal Created Created container helper
Normal Started Started container helper
Running on RUNNER-REDACTED via
ratchade-MBP...
Getting source from Git repository
00:18
Fetching changes with git depth set to 5...
Initialized empty Git repository in C:/builds/ra-group2/playground-bis/.git/
Created fresh repository.
Checking out 789941f4 as detached HEAD (ref is windows-tests)...
git-lfs/3.5.1 (GitHub; windows amd64; go 1.21.7; git e237bb3a)
Skipping Git submodules setup
Executing "step_script" stage of the job script
01:51
$ echo $MY_TEST_VARIABLE_1
project variable
$ echo $MY_TEST_VARIABLE_2
project variable
$ $PSVersionTable.PSVersion
Major Minor Build Revision
----- ----- ----- --------
5 1 20348 2760
$ powershell.exe -C sleep 1800
WARNING: script canceled externally (UI, API)
Running after_script
00:15
Running after script...
$ echo "this is the after_script running"
this is the after_script running
Cleaning up project directory and file based variables
00:13
ERROR: Job failed: canceled