Add support for activeDeadlineSeconds on CI Job Pod with k8s executor
What does this MR do?
Add support for Pod activeDeadlineSeconds
for the Kubernetes executor. The feature is behind a feature flag to prevent unseen problems during testing. When the feature flag FF_USE_POD_ACTIVE_DEADLINE_SECONDS
is enabled, the activeDeadlineSeconds
defaults to the job timeout + 1 (to allow the job to timeout on GitLab side and not automatically fail).
Why was this MR needed?
See issue description: #29279 (closed)
What's the best way to test this MR?
gitlab-ci
variables:
FF_USE_POD_ACTIVE_DEADLINE_SECONDS: "true"
stages:
- test
format:
stage: test
script:
- sleep 6000s
after_script:
- echo $CI_JOB_STATUS
timeout: 60 seconds
config.toml
concurrent = 1
check_interval = 1
log_level = "debug"
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = ""
url = "https://gitlab.com/"
id = 0
token = "REDACTED"
token_obtained_at = 0001-01-01T00:00:00Z
token_expires_at = 0001-01-01T00:00:00Z
executor = "kubernetes"
shell = "bash"
[runners.kubernetes]
host = ""
bearer_token_overwrite_allowed = false
image = "alpine"
namespace = ""
namespace_overwrite_allowed = ""
node_selector_overwrite_allowed = ""
pod_termination_grace_period_seconds = 0
pod_labels_overwrite_allowed = ""
service_account_overwrite_allowed = ""
pod_annotations_overwrite_allowed = ""
[runners.kubernetes.pod_security_context]
run_as_group = 100
run_as_user = 100
[runners.kubernetes.volumes]
[[runners.kubernetes.services]]
name = "alpine:latest"
alias = "alpine-service"
command = ["sleep 900s"]
entrypoint = ["/bin/sh", "-c"]
[runners.kubernetes.dns_config]
- The job times out as expected
- The kubectl event shows that a termination signal was sent to the pod after a minute + 1 sec
❯ kubectl get events --watch -o custom-columns=FirstSeen:.firstTimestamp,LastSeen:.lastTimestamp,Count:.count,From:.source.component,Type:.type,Reason:.reason,Message:.message --field-selector involvedObject.name=runner-xxx-project-xxx-concurrent-0pwhjq
2023-02-06T20:36:29Z 2023-02-06T20:36:29Z 1 default-scheduler Normal Scheduled Successfully assigned default/runner-xxx-project-xxx-concurrent-0pwhjq to gke-xxx-default-pool-xxx
(...)
2023-02-06T20:37:30Z 2023-02-06T20:37:30Z 1 kubelet Normal DeadlineExceeded Pod was active on the node longer than the specified deadline
complete kubectl event log
❯ kubectl get events --watch -o custom-columns=FirstSeen:.firstTimestamp,LastSeen:.lastTimestamp,Count:.count,From:.source.component,Type:.type,Reason:.reason,Message:.message --field-selector involvedObject.name=runner-xxx-project-xxx-concurrent-0pwhjq
FirstSeen LastSeen Count From Type Reason Message
2023-02-06T20:36:29Z 2023-02-06T20:36:29Z 1 default-scheduler Normal Scheduled Successfully assigned default/runner-xxx-project-xxx-concurrent-0pwhjq to gke-xxx-default-pool-xxx
2023-02-06T20:36:30Z 2023-02-06T20:36:30Z 1 kubelet Normal Pulled Container image "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest" already present on machine
2023-02-06T20:36:30Z 2023-02-06T20:36:30Z 1 kubelet Normal Created Created container init-permissions
2023-02-06T20:36:30Z 2023-02-06T20:36:30Z 1 kubelet Normal Started Started container init-permissions
2023-02-06T20:36:31Z 2023-02-06T20:36:31Z 1 kubelet Normal Pulling Pulling image "alpine"
2023-02-06T20:36:31Z 2023-02-06T20:36:31Z 1 kubelet Normal Pulled Successfully pulled image "alpine" in 119.382703ms
2023-02-06T20:36:31Z 2023-02-06T20:36:31Z 1 kubelet Normal Created Created container build
2023-02-06T20:36:31Z 2023-02-06T20:36:31Z 1 kubelet Normal Started Started container build
2023-02-06T20:36:31Z 2023-02-06T20:36:31Z 1 kubelet Normal Pulled Container image "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest" already present on machine
2023-02-06T20:36:31Z 2023-02-06T20:36:31Z 1 kubelet Normal Created Created container helper
2023-02-06T20:36:31Z 2023-02-06T20:36:31Z 1 kubelet Normal Started Started container helper
2023-02-06T20:36:31Z 2023-02-06T20:36:31Z 1 kubelet Normal Pulling Pulling image "alpine:latest"
2023-02-06T20:36:31Z 2023-02-06T20:36:31Z 1 kubelet Normal Pulled Successfully pulled image "alpine:latest" in 108.939773ms
2023-02-06T20:36:31Z 2023-02-06T20:36:31Z 1 kubelet Normal Created Created container svc-0
2023-02-06T20:36:31Z 2023-02-06T20:36:31Z 1 kubelet Normal Started Started container svc-0
2023-02-06T20:37:28Z 2023-02-06T20:37:28Z 1 kubelet Normal Killing Stopping container build
2023-02-06T20:37:28Z 2023-02-06T20:37:28Z 1 kubelet Normal Killing Stopping container svc-0
2023-02-06T20:37:28Z 2023-02-06T20:37:28Z 1 kubelet Normal Killing Stopping container helper
2023-02-06T20:37:30Z 2023-02-06T20:37:30Z 1 kubelet Normal DeadlineExceeded Pod was active on the node longer than the specified deadline
2023-02-06T20:37:28Z 2023-02-06T20:37:30Z 2 kubelet Normal Killing Stopping container helper
2023-02-06T20:37:28Z 2023-02-06T20:37:30Z 2 kubelet Normal Killing Stopping container build
2023-02-06T20:37:28Z 2023-02-06T20:37:30Z 2 kubelet Normal Killing Stopping container svc-0
What are the relevant issue numbers?
close #29279 (closed)