Adds configurable value probeTimeoutSeconds (!306) · Merge requests · GitLab.org / charts / GitLab Runner

Kyle Wetzler requested to merge kwetzler1/gitlab-runner:add-configurable-probe-timeouts into main Aug 12, 2021

What does this MR do?

This MR allows for the timeoutSeconds of the liveness and readiness probes for the runner to be configurable, with a default of the current value, 1s. The new value being added to the values.yaml is probeTimeoutSeconds and the value is used for both the liveness and readiness probe.

Why was this MR needed?

For Kubernetes versions prior to 1.20 the timeoutSeconds of exec probes were not being respected, and so probe calls that ran longer than 1s never failed due to this timeout. Now on Kubernetes versions 1.20+ those timeouts are being enforced, and are causing intermittent probe failures, that cause pod restarts.

The Kubernetes documentation makes mention of this here, with the note:

Before Kubernetes 1.20, the field timeoutSeconds was not respected for exec probes: probes continued running indefinitely, even past their configured deadline, until a result was returned.

As a result, we need a way to configure the timeoutSeconds to be greater than 1s.

What's the best way to test this MR?

We deployed two runners using the Kubernetes executor to two clusters, one on version 1.20 and one on a lesser version (1.17). Saw that the runner on 1.20 experiences intermittent probe failures after a short period of time, (~15 mins) where the runner on a lesser version never experiences the failures.

Please see the mentioned issue number for more details and debugging efforts, thanks!

What are the relevant issue numbers?

#304 (closed)

Adds configurable value probeTimeoutSeconds

What does this MR do?

Why was this MR needed?

What's the best way to test this MR?

What are the relevant issue numbers?

Merge request reports