Block a zoekt node and reassign namespaces when there are consistent failures

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

As a continuation of the node backoff feature introduced in !136346 (merged), when a namespace has consistent failures when interacting with a node, that node should be blocked or put on a cool down period and the namespace should be reassigned to a different node.

We should keep the Elasticsearch fallback for when there is either a temporary error for a node or a complete zoekt outage.

Proposal

  • 1-5 errors: node backoff and temporary fallback to Elasticsearch
  • 5+ errors: node is blocked and affected namespaces are assigned (👈 this issue)
  • errors occurring across several nodes simultaneously: fallback to Elasticsearch (out of scope for this issue)
Edited by 🤖 GitLab Bot 🤖