Skip to content

Kubernetes attach strategy hangs when log file is deleted

Romuald Atchadé requested to merge kube-attach-strategy-hangs into master

What does this MR do?

This issue adds the pod deletion when gitlab-runner is no more able to stream the logs from the log file.

Why was this MR needed?

This MR is needed to avoid the use case when the job hangs following the logs deletion leaving the end user without any feedbacks.

What's the best way to test this MR?

The following configurations are needed to test this MR

  1. Generate the new helper image
eval $(minikube -p minikube docker-env) #if using minikube
make helper-dockerarchive-host
  1. Push the new image out/binaries/gitlab-runner-helper/gitlab-runner-helper.x86_64 on your personal docker hub account. The docker hub link to this helper image will be needed in the config.toml
.gitlab-ci.yml
hello:
  image: alpine
  script:
    - sleep 5000
config.toml
[[runners]]
  name = "kubernetes"
  url = "https://gitlab.com/"
  token = "YOUR_TOKEN_HERE"
  executor = "kubernetes"
  [runners.custom_build_dir]
  [runners.cache]
    [runners.cache.s3]
    [runners.cache.gcs]
    [runners.cache.azure]
  [runners.feature_flags]
    FF_USE_LEGACY_KUBERNETES_EXECUTION_STRATEGY = false
  [runners.kubernetes]
    host = ""
    bearer_token_overwrite_allowed = false
    namespace = ""
    helper_image = "NEW_HELPER_IMAGE"
    namespace_overwrite_allowed = ""
    privileged = false
    service_account_overwrite_allowed = ""
    pod_annotations_overwrite_allowed = ""
    idle_count = 2
    idle_time = 60
    [runners.kubernetes.affinity]
    [runners.kubernetes.pod_security_context]
    [runners.kubernetes.volumes]

As described in the related issue, follow the steps below:

  1. Run a the above .gitlab-ci.yml file.
  2. Retrieve the name of the pod running POD_RUNNING_JOB the job with the command kubectl get pods. The pod age is a good indicator if you have more than one pod running
  3. When the job log start to output the date-time run the follow command to delete the log file kubectl exec -it -c helper POD_RUNNING_JOB -- sh -c 'rm /logs-PROJECT_ID-JOB_RESPONSE_ID/output.log'.

Once the log file delete, the job will display an error message (after few second) about the log deletion

WARNING: output log file deleted, cannot continue streaming logs default/runner-lr33aybb-project-24422682-concurrent-0dc2sw/helper:/logs-24422682-1268232072/output.log: command terminated with exit code 100
Cleaning up file based variables
ERROR: Job failed: command terminated with exit code 100

The expected log will log like follow:

Log after log file deletion

Screen_Shot_2021-05-17_at_12.59.26_PM

Screen_Shot_2021-05-17_at_1.00.27_PM

The integration test TestLogDeletionFeatureFlag can also be used to test the addition. To do so, the

t.Skip("Log deletion test temporary skipped: issue https://gitlab.com/gitlab-org/gitlab-runner/-/issues/27755")

should be commented and the following line added just after the initialization of the build variable

build.Runner.RunnerSettings.Kubernetes.HelperImage = "gitlab/gitlab-runner-helper:XXXX"

XXXX should be replaced by the tag generated by the make helper-dockerarchive-host command for the helper image

What are the relevant issue numbers?

closes: #26032 (closed)

Edited by Georgi N. Georgiev

Merge request reports