Circuitbreaker backoff and retries
What does this MR do?
In this MR we try to avoid one-off failures of the circuitbreaker and improve it's robustness:
It will now perform the stat check multiple times within the timeout. This can be specified using the circuitbreaker_access_retries
application setting.
There are also 2 failure modes now:
- As soon as we reach
circuitbreaker_backoff_threshold
failures for a shard on a host, access to the shard on that host will be blocked forcircuitbreaker_failure_wait_time
- When we reach
circuitbreaker_failure_count_threshold
we will block all access until information is reset manually or aftercircuitbreaker_failure_reset_time
While I was doing that, I also made the methods in the circuit breaker that aren't used publicly private.
Does this MR meet the acceptance criteria?
-
Changelog entry added, if necessary -
Documentation created/updated -
API support added -
Tests added for this feature/bug - Review
-
Has been reviewed by Backend
-
What are the relevant issue numbers?
Closes #37383 (closed)
Closes #38231 (closed)
Merge request reports
Activity
mentioned in issue #37383 (closed)
added 4 commits
Toggle commit listmarked the checklist item Documentation created/updated as completed
marked the checklist item Changelog entry added, if necessary as completed
added 5 commits
- 90305f3b - Add new circuitbreaker properties to application_settings
- 01078367 - Allow configuring new circuitbreaker settings from the UI and API
- 56a7c77c - Implement backoff for the circuitbreaker
- c675f646 - Perform the stat check multiple times when checking a storage
- 87256fbf - Allow enabling the circuitbreaker using an env variable
Toggle commit list@nick.thomas Do you feel like reviewing this?
assigned to @nick.thomas
@reprazent the source branch is missing, so there's nothing for me to review :/ can you fix?
assigned to @reprazent
Woops, sorry @nick.thomasassigned to @nick.thomas
- Resolved by Bob Van Landuyt