Skip to content

Allow to specify retry limit for Kubernetes API calls

What does this MR do?

It allows to specify limit for Kubernetes API calls instead of hardcoded limit in const

Why was this MR needed?

We run our jobs in EKS Kuberentes and use Karpenter for scaling nodes.

When using default limit, our jobs often fail with prepare environment: setting up trapping scripts on emptyDir: error dialing backend: remote error: tls: internal error

When we bumped defaultTries to 35, issue disappeared. Therefore we made this limit configurable defaulting to old limit, not to enforce higher limit for everyone.

We're running this code on on production since 6.12.2023 and successfully processed more than 6000 jobs since then.

What's the best way to test this MR?

What are the relevant issue numbers?

Edited by Michał Skibicki

Merge request reports