[k8s] Do not wait poll timeout when container has terminated

What does this MR do?

Before the change, we would wait for poll timeout (default to 3 minutes) to expire before giving up to stream the container log. This is not necessary when the container has already terminated. With this change, the log capture aborts when the status of the container is terminated

Why was this MR needed?

To make job run faster

What's the best way to test this MR?

gitlab-ci
variables:
  # KUBERNETES_NODE_SELECTOR_ARCH: 'kubernetes.io/arch=arm64'
  # FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY: "true"
  # FF_KUBERNETES_HONOR_ENTRYPOINT: "false" 
  FF_USE_POWERSHELL_PATH_RESOLVER: "true"
  FF_RETRIEVE_POD_WARNING_EVENTS: "true"
  FF_PRINT_POD_EVENTS: "true"
  FF_SCRIPT_SECTIONS: "true"
  CI_DEBUG_SERVICES: "true"

date_debian:
  stage: build
  image: alpine
  # image: debian:bullseye
  # image: ubuntu
  script:
    - ls -la /tmp
    - sleep 120
    # - | 
    #   while true; do date && sleep 3 ; done
  services:
    - name: "postgres:12.17-alpine3.19"
      variables:
        HEALTHCHECK_TCP_PORT: "5432"
config.toml

On the mr branch

On the MR Branch, the job finishes (and fails as expected) in 33 seconds because the container services are OOMKilled

OOMKilled services

Screenshot_2024-10-24_at_4.18.02_PM

Duration: 33 seconds
Finished: Oct 24, 2024, 4:18 p.m.
Queued:   0 seconds
Timeout:  10m (from runner) 

On main branch

On the MR Branch, the job runs for more than 6 minutes and is retried only twice until canceled manually because the container services are OOMKilled

Duration: 6 minutes 26 seconds
Finished: Oct 24, 2024, 4:26 p.m.
Queued:   1 second
Timeout:  10m (from runner) 

On 17-3-stable branch

On the 17-3 Branch, the job finishes (and fails as expected) after 33 seconds because the container services are OOMKilled

Duration: 33 seconds
Finished: Oct 24, 2024, 4:29 p.m.
Queued:   1 second
Timeout:  10m (from runner) 

What are the relevant issue numbers?

None

Merge request reports

Loading