Skip to content

Fixes ISSUE-52445: Makes cluster timeout threshold configurable

Description

In a cluster environment, the instance in charge of a service is supposed to deliver a ping every milliseconds to make other instances know that it is still in charge of the service. We had a fixed threshold of *10% to avoid false positives and accidentally changing service owners too soon.

This threshold has demonstrated to be too small (i.e. if is 10s, after 11s with no ping another instanca can take charge of a service). The threshold is now configurable in the same window as the timeout, and has been defaulted to 100% (meaning that if is 10s, no other instance will try to be in charge of its processes until at list 20s pass without a ping.

The maximum threshold (not percentage but ms total) has been increased from 5s to 60s. Also some log has been added when the timeout and threshold is calculated

Log example with timeout set to 1s and threshold to 100%: Cluster timeout threshold defined to 1000ms. After 2000ms with no ping, another cluster instance will take charge

Links

Edited by Carlos Aristu

Merge request reports