Skip to content

Add support for activeDeadlineSeconds on CI Job Pod with k8s executor

Romuald Atchadé requested to merge k8s-support-active-deadline-seconds into main

What does this MR do?

Add support for Pod activeDeadlineSeconds for the Kubernetes executor. The feature is behind a feature flag to prevent unseen problems during testing. When the feature flag FF_USE_POD_ACTIVE_DEADLINE_SECONDS is enabled, the activeDeadlineSeconds defaults to the job timeout + 1 (to allow the job to timeout on GitLab side and not automatically fail).

Why was this MR needed?

See issue description: #29279 (closed)

What's the best way to test this MR?

gitlab-ci
variables:
  FF_USE_POD_ACTIVE_DEADLINE_SECONDS: "true"

stages:
  - test

format:
  stage: test
  script:
    - sleep 6000s
  after_script:
    - echo $CI_JOB_STATUS
  timeout: 60 seconds
config.toml
concurrent = 1
check_interval = 1
log_level = "debug"
shutdown_timeout = 0

[session_server]
  session_timeout = 1800

[[runners]]
  name = ""
  url = "https://gitlab.com/"
  id = 0
  token = "REDACTED"
  token_obtained_at = 0001-01-01T00:00:00Z
  token_expires_at = 0001-01-01T00:00:00Z
  executor = "kubernetes"
  shell = "bash"
  [runners.kubernetes]
    host = ""
    bearer_token_overwrite_allowed = false
    image = "alpine"
    namespace = ""
    namespace_overwrite_allowed = ""
    node_selector_overwrite_allowed = ""
    pod_termination_grace_period_seconds = 0
    pod_labels_overwrite_allowed = ""
    service_account_overwrite_allowed = ""
    pod_annotations_overwrite_allowed = ""
    [runners.kubernetes.pod_security_context]
      run_as_group = 100
      run_as_user = 100
    [runners.kubernetes.volumes]

    [[runners.kubernetes.services]]
      name = "alpine:latest"
      alias = "alpine-service"
      command = ["sleep 900s"]
      entrypoint = ["/bin/sh", "-c"]
    [runners.kubernetes.dns_config]
  1. The job times out as expected
  2. The kubectl event shows that a termination signal was sent to the pod after a minute + 1 sec
❯ kubectl get events --watch -o custom-columns=FirstSeen:.firstTimestamp,LastSeen:.lastTimestamp,Count:.count,From:.source.component,Type:.type,Reason:.reason,Message:.message --field-selector involvedObject.name=runner-xxx-project-xxx-concurrent-0pwhjq
2023-02-06T20:36:29Z   2023-02-06T20:36:29Z   1       default-scheduler   Normal   Scheduled   Successfully assigned default/runner-xxx-project-xxx-concurrent-0pwhjq to gke-xxx-default-pool-xxx
(...)
2023-02-06T20:37:30Z   2023-02-06T20:37:30Z   1       kubelet             Normal   DeadlineExceeded   Pod was active on the node longer than the specified deadline
complete kubectl event log
❯ kubectl get events --watch -o custom-columns=FirstSeen:.firstTimestamp,LastSeen:.lastTimestamp,Count:.count,From:.source.component,Type:.type,Reason:.reason,Message:.message --field-selector involvedObject.name=runner-xxx-project-xxx-concurrent-0pwhjq
FirstSeen              LastSeen               Count   From                Type     Reason      Message
2023-02-06T20:36:29Z   2023-02-06T20:36:29Z   1       default-scheduler   Normal   Scheduled   Successfully assigned default/runner-xxx-project-xxx-concurrent-0pwhjq to gke-xxx-default-pool-xxx
2023-02-06T20:36:30Z   2023-02-06T20:36:30Z   1       kubelet             Normal   Pulled      Container image "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest" already present on machine
2023-02-06T20:36:30Z   2023-02-06T20:36:30Z   1       kubelet             Normal   Created     Created container init-permissions
2023-02-06T20:36:30Z   2023-02-06T20:36:30Z   1       kubelet             Normal   Started     Started container init-permissions
2023-02-06T20:36:31Z   2023-02-06T20:36:31Z   1       kubelet             Normal   Pulling     Pulling image "alpine"
2023-02-06T20:36:31Z   2023-02-06T20:36:31Z   1       kubelet             Normal   Pulled      Successfully pulled image "alpine" in 119.382703ms
2023-02-06T20:36:31Z   2023-02-06T20:36:31Z   1       kubelet             Normal   Created     Created container build
2023-02-06T20:36:31Z   2023-02-06T20:36:31Z   1       kubelet             Normal   Started     Started container build
2023-02-06T20:36:31Z   2023-02-06T20:36:31Z   1       kubelet             Normal   Pulled      Container image "registry.gitlab.com/gitlab-org/gitlab-runner/gitlab-runner-helper:x86_64-latest" already present on machine
2023-02-06T20:36:31Z   2023-02-06T20:36:31Z   1       kubelet             Normal   Created     Created container helper
2023-02-06T20:36:31Z   2023-02-06T20:36:31Z   1       kubelet             Normal   Started     Started container helper
2023-02-06T20:36:31Z   2023-02-06T20:36:31Z   1       kubelet             Normal   Pulling     Pulling image "alpine:latest"
2023-02-06T20:36:31Z   2023-02-06T20:36:31Z   1       kubelet             Normal   Pulled      Successfully pulled image "alpine:latest" in 108.939773ms
2023-02-06T20:36:31Z   2023-02-06T20:36:31Z   1       kubelet             Normal   Created     Created container svc-0
2023-02-06T20:36:31Z   2023-02-06T20:36:31Z   1       kubelet             Normal   Started     Started container svc-0
2023-02-06T20:37:28Z   2023-02-06T20:37:28Z   1       kubelet             Normal   Killing     Stopping container build
2023-02-06T20:37:28Z   2023-02-06T20:37:28Z   1       kubelet             Normal   Killing     Stopping container svc-0
2023-02-06T20:37:28Z   2023-02-06T20:37:28Z   1       kubelet             Normal   Killing     Stopping container helper
2023-02-06T20:37:30Z   2023-02-06T20:37:30Z   1       kubelet             Normal   DeadlineExceeded   Pod was active on the node longer than the specified deadline
2023-02-06T20:37:28Z   2023-02-06T20:37:30Z   2       kubelet             Normal   Killing            Stopping container helper
2023-02-06T20:37:28Z   2023-02-06T20:37:30Z   2       kubelet             Normal   Killing            Stopping container build
2023-02-06T20:37:28Z   2023-02-06T20:37:30Z   2       kubelet             Normal   Killing            Stopping container svc-0

What are the relevant issue numbers?

close #29279 (closed)

Merge request reports