Skip to content

Circuitbreaker backoff and retries

Bob Van Landuyt requested to merge bvl-circuitbreaker-backoff into master

What does this MR do?

In this MR we try to avoid one-off failures of the circuitbreaker and improve it's robustness:

It will now perform the stat check multiple times within the timeout. This can be specified using the circuitbreaker_access_retries application setting.

There are also 2 failure modes now:

  • As soon as we reach circuitbreaker_backoff_threshold failures for a shard on a host, access to the shard on that host will be blocked for circuitbreaker_failure_wait_time
  • When we reach circuitbreaker_failure_count_threshold we will block all access until information is reset manually or after circuitbreaker_failure_reset_time

While I was doing that, I also made the methods in the circuit breaker that aren't used publicly private.

Does this MR meet the acceptance criteria?

What are the relevant issue numbers?

Closes #37383 (closed)

Closes #38231 (closed)

Edited by Bob Van Landuyt

Merge request reports