Zoekt: Node health evaluation and search routing based on process health
## Summary
Add logic to evaluate node health based on the new process health data and surface a per-node health status that can be used for search routing decisions. Unhealthy nodes are fully excluded from search routing.
## Health Evaluation Rules
### Crashloop Detection
- `restarts_15m >= N` (configurable threshold, default: 2) for **either** process (indexer or webserver) → node is **unhealthy**
- A single restart is forgiven (transient OOM, deploy, etc.)
- Recovery is automatic: once the restarts age out of the 15m window, the node becomes healthy again
- Add a new Zoekt application setting for the restart threshold (tunable from Rails without redeploying)
- Indexer crashlooping also affects search — stale indices mean outdated results, and the heartbeat (including relayed webserver metrics) becomes unreliable
### mmap Exhaustion
- `mmap_current / mmap_max >= 0.95` (95%) for **either** process → node is **unhealthy**, stop serving search traffic
- `mmap_current / mmap_max >= 0.80` (80%) → warning only (operators should investigate, node stays in rotation)
### Webserver Staleness
- `webserver_last_seen_at` older than `ONLINE_DURATION_THRESHOLD` (30s) → webserver is unresponsive, node is **unhealthy** for search
- Simple scope: `scope :webserver_online, -> { where(webserver_last_seen_at: THRESHOLD.ago..) }`
## Search Routing Changes
### Updated `.online` Scope
Update the existing `.online` scope to require both processes to be reporting when the node supports it:
- Guard with `all_at_least_version?(MIN_PROCESS_HEALTH_VERSION)` — same pattern used for offset pagination in !225523
- When all nodes support process health: `.online` checks both `last_seen_at` and `webserver_last_seen_at`
- When any node is on an older version: fall back to current behavior (only check `last_seen_at`)
- Since `.searchable` is an alias for `.online`, search routing automatically benefits
### `search_healthy` Scope
Add a `search_healthy` scope that combines `.online` with process health checks (crashloop, mmap) for the load balancer.
### "All Nodes Unhealthy" Fallback
**Important**: When all nodes are marked unhealthy, fall back to all online nodes rather than hard-failing. This avoids the cascade failure pattern that caused the previous circuit breaker removal (!190464). The health exclusion should only activate when a *subset* of nodes is failing.
## Related
- Connects to the `last_search_failure_at` proposal (https://gitlab.com/gitlab-org/gitlab/-/issues/593206) — proactive health signals complement reactive failure tracking
- Previous circuit breaker history: !136346 (added), !190464 (removed)
issue