[k8s] Terminate PowerShell Script children processes when cancelling the job through UI

What does this MR do?

Based on user feedbacks gitlab#462181 (comment 2151012522), it looks like the work done in !4813 (merged) and !4980 (merged) were not sufficient as we still have use cases where the job won't get canceled (after cancelation through the UI).

After investigation, the issue seems to be related to child process for PowerShell.

In this MR, upon cancelation, the script terminates the stage process and all children processes associated.

Why was this MR needed?

To allow job cancelation through the UI.

What's the best way to test this MR?

gitlab-ci
variables:
  FF_KUBERNETES_HONOR_ENTRYPOINT: "true" 
  FF_USE_POWERSHELL_PATH_RESOLVER: "true"
  FF_RETRIEVE_POD_WARNING_EVENTS: "true"
  FF_PRINT_POD_EVENTS: "true"
  FF_SCRIPT_SECTIONS: "true"
  CI_DEBUG_SERVICES: "true"
  GIT_DEPTH: 5

simple-job:
  image: mcr.microsoft.com/windows/servercore:ltsc2022
  timeout: 1h
  script:
    - echo $MY_TEST_VARIABLE_1
    - echo $MY_TEST_VARIABLE_2
    - $PSVersionTable.PSVersion
    - powershell.exe -C sleep 1800
config.toml
concurrent = 1
check_interval = 1
log_level = "info"
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = ""
  url = "https://gitlab.com/"
  id = 0
  token = "glrt-REDACTED"
  token_obtained_at = 2024-05-07T16:42:27Z
  executor = "kubernetes"
  shell = "powershell"
  [runners.kubernetes]
    host = ""
    bearer_token_overwrite_allowed = false
    image = "mcr.microsoft.com/windows/servercore:ltsc2022"
    namespace = ""
    namespace_overwrite_allowed = ""
    node_selector_overwrite_allowed = ""
    helper_image = "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest-servercore21H2"
    poll_timeout = 3600
    pod_labels_overwrite_allowed = ""
    service_account_overwrite_allowed = ""
    pod_annotations_overwrite_allowed = ""
    scripts_base_dir = "/dev"
    [runners.kubernetes.node_selector]
        "kubernetes.io/arch" = "amd64"
        "kubernetes.io/os" = "windows"
        "node.kubernetes.io/windows-build" = "10.0.20348"
    [runners.kubernetes.pod_security_context]
    [runners.kubernetes.volumes]
    [runners.kubernetes.dns_config]

Cancel the job while running. The job should be canceled and the after_script should run as expected (job log):

job
Running with gitlab-runner development version (HEAD)
  on  REDACTED, system ID: s_b188029b2abb
  feature flags: FF_USE_POWERSHELL_PATH_RESOLVER:true, FF_SCRIPT_SECTIONS:true, FF_KUBERNETES_HONOR_ENTRYPOINT:true, FF_PRINT_POD_EVENTS:true
Preparing the "kubernetes" executor
00:00
WARNING: Namespace is empty, therefore assuming 'default'.
Using Kubernetes namespace: default
Using Kubernetes executor with image mcr.microsoft.com/windows/servercore:ltsc2022 ...
Using attach strategy to execute scripts...
Preparing environment
00:47
Using FF_USE_POD_ACTIVE_DEADLINE_SECONDS, the Pod activeDeadlineSeconds will be set to the job timeout: 10m0s...
Subscribing to Kubernetes Pod events...
Type     Reason      Message
Normal   Scheduled   Successfully assigned default/runner-REDACTED-project-25452826-concurrent-0-wvlk7q46 to gke-ab2d29-xncn
Normal   Pulled   Container image "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest-servercore21H2" already present on machine
Normal   Created   Created container init-permissions
Normal   Started   Started container init-permissions
Normal   Pulled   Container image "mcr.microsoft.com/windows/servercore:ltsc2022" already present on machine
Normal   Created   Created container build
Normal   Started   Started container build
Normal   Pulled   Container image "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest-servercore21H2" already present on machine
Normal   Created   Created container helper
Normal   Started   Started container helper

Running on RUNNER-REDACTED via 
ratchade-MBP...
Getting source from Git repository
00:18
Fetching changes with git depth set to 5...
Initialized empty Git repository in C:/builds/ra-group2/playground-bis/.git/
Created fresh repository.
Checking out 789941f4 as detached HEAD (ref is windows-tests)...
git-lfs/3.5.1 (GitHub; windows amd64; go 1.21.7; git e237bb3a)
Skipping Git submodules setup
Executing "step_script" stage of the job script
01:51
$ echo $MY_TEST_VARIABLE_1
project variable
$ echo $MY_TEST_VARIABLE_2
project variable
$ $PSVersionTable.PSVersion
Major  Minor  Build  Revision
-----  -----  -----  --------
5      1      20348  2760    
$ powershell.exe -C sleep 1800
WARNING: script canceled externally (UI, API)
Running after_script
00:15
Running after script...
$ echo "this is the after_script running"
this is the after_script running
Cleaning up project directory and file based variables
00:13
ERROR: Job failed: canceled

What are the relevant issue numbers?

gitlab#462181 (closed)

Merge request reports

Loading