Skip to content

Set `FF_USE_POD_ACTIVE_DEADLINE_SECONDS` default value to `true`

Romuald Atchadé requested to merge k8s-enable-pod-active-seconds into main

What does this MR do?

When the Runner Manager Pod is deleted in the context of the executorkubernetes, all the job Pods started become orphans and can be stuck until they are explicitly deleted. The current recommendations for this situation is the use of the GitLab Runner Pod Cleanup.

We introduced in %15.10 a new feature flag FF_USE_POD_ACTIVE_DEADLINE_SECONDS which makes sure that the job Pod is forcibly deleted when the job times out. Its use seems to prevent considerably the number of orphan Pods in a cluster (see gitlab#390645 (comment 1498163139)). This feature flag is by default set to false.

With this MR we set the default value of the FF_USE_POD_ACTIVE_DEADLINE_SECONDS to true.

Why was this MR needed?

Avoid orphan job Pod when the GitLab Runner Pod is deleted.

What's the best way to test this MR?

See steps described here: !3897 (merged)

What are the relevant issue numbers?

Fixes gitlab#390645 (closed) #25378 (closed)

Edited by Romuald Atchadé

Merge request reports