Circuitbreaker backoff and retries
What does this MR do?
In this MR we try to avoid one-off failures of the circuitbreaker and improve it's robustness:
It will now perform the stat check multiple times within the timeout. This can be specified using the circuitbreaker_access_retries
application setting.
There are also 2 failure modes now:
- As soon as we reach
circuitbreaker_backoff_threshold
failures for a shard on a host, access to the shard on that host will be blocked forcircuitbreaker_failure_wait_time
- When we reach
circuitbreaker_failure_count_threshold
we will block all access until information is reset manually or aftercircuitbreaker_failure_reset_time
While I was doing that, I also made the methods in the circuit breaker that aren't used publicly private.
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added -
Tests added for this feature/bug - Review
-
Has been reviewed by Backend
-
What are the relevant issue numbers?
Closes #37383 (closed)
Closes #38231 (closed)
Edited by Bob Van Landuyt