Circuitbreaker backoff and retries
What does this MR do?
In this MR we try to avoid one-off failures of the circuitbreaker and improve it's robustness:
It will now perform the stat check multiple times within the timeout. This can be specified using the
circuitbreaker_access_retries application setting.
There are also 2 failure modes now:
- As soon as we reach
circuitbreaker_backoff_thresholdfailures for a shard on a host, access to the shard on that host will be blocked for
- When we reach
circuitbreaker_failure_count_thresholdwe will block all access until information is reset manually or after
While I was doing that, I also made the methods in the circuit breaker that aren't used publicly private.
Does this MR meet the acceptance criteria?
- Changelog entry added, if necessary
- Documentation created/updated
- API support added
- Tests added for this feature/bug
- Has been reviewed by Backend
What are the relevant issue numbers?
Closes #37383 (closed)
Closes #38231 (closed)