Make kubernetes API retries configurable
What does this MR do?
It allows to specify limit for Kubernetes API calls instead of hardcoded limit in const
Why was this MR needed?
We run our jobs in EKS Kuberentes and use Karpenter for scaling nodes.
When using default limit, our jobs often fail with prepare environment: setting up trapping scripts on emptyDir: error dialing backend: remote error: tls: internal error
When we bumped defaultTries
to 35, issue disappeared. Therefore we made this limit configurable defaulting to old limit, not to enforce higher limit for everyone.
We're running this code on on production since 6.12.2023 and successfully processed more than 6000 jobs since then.