Block a zoekt node and reassign namespaces when there are consistent failures
Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.
As a continuation of the node backoff feature introduced in !136346 (merged), when a namespace has consistent failures when interacting with a node, that node should be blocked or put on a cool down period and the namespace should be reassigned to a different node.
We should keep the Elasticsearch fallback for when there is either a temporary error for a node or a complete zoekt outage.
Proposal
- 1-5 errors: node backoff and temporary fallback to Elasticsearch
- 5+ errors: node is blocked and affected namespaces are assigned (
👈 this issue) - errors occurring across several nodes simultaneously: fallback to Elasticsearch (out of scope for this issue)
Edited by 🤖 GitLab Bot 🤖