[k8s] Do not wait poll timeout when container has terminated
What does this MR do?
Before the change, we would wait for poll timeout (default to 3 minutes) to expire before giving up to stream the container log. This is not necessary when the container has already terminated. With this change, the log capture aborts when the status of the container is terminated
Why was this MR needed?
To make job run faster
What's the best way to test this MR?
gitlab-ci
variables:
# KUBERNETES_NODE_SELECTOR_ARCH: 'kubernetes.io/arch=arm64'
# FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY: "true"
# FF_KUBERNETES_HONOR_ENTRYPOINT: "false"
FF_USE_POWERSHELL_PATH_RESOLVER: "true"
FF_RETRIEVE_POD_WARNING_EVENTS: "true"
FF_PRINT_POD_EVENTS: "true"
FF_SCRIPT_SECTIONS: "true"
CI_DEBUG_SERVICES: "true"
date_debian:
stage: build
image: alpine
# image: debian:bullseye
# image: ubuntu
script:
- ls -la /tmp
- sleep 120
# - |
# while true; do date && sleep 3 ; done
services:
- name: "postgres:12.17-alpine3.19"
variables:
HEALTHCHECK_TCP_PORT: "5432"
config.toml
On the mr branch
On the MR Branch, the job finishes (and fails as expected) in 33 seconds because the container services are OOMKilled
Duration: 33 seconds
Finished: Oct 24, 2024, 4:18 p.m.
Queued: 0 seconds
Timeout: 10m (from runner)
On main branch
On the MR Branch, the job runs for more than 6 minutes and is retried only twice until canceled manually because the container services are OOMKilled
Duration: 6 minutes 26 seconds
Finished: Oct 24, 2024, 4:26 p.m.
Queued: 1 second
Timeout: 10m (from runner)
On 17-3-stable branch
On the 17-3 Branch, the job finishes (and fails as expected) after 33 seconds because the container services are OOMKilled
Duration: 33 seconds
Finished: Oct 24, 2024, 4:29 p.m.
Queued: 1 second
Timeout: 10m (from runner)
What are the relevant issue numbers?
None
