Zoekt: Webserver-to-indexer process health relay
Summary
Add a new internal HTTP endpoint on the indexer (POST /indexer/internal/process_health) that the webserver can call on localhost to push its process health data. The indexer stores the latest webserver report in memory and includes it in the next heartbeat payload to Rails.
Repository: https://gitlab.com/gitlab-org/gitlab-zoekt-indexer
Details
- Webserver discovers the indexer's address via a new env variable
ZOEKT_INDEXER_INTERNAL_URL, defaulting tohttp://localhost:6065— this matches the indexer port in the gitlab-zoekt Helm chart (values.yaml: indexer.listen.port: 6065), so it works out of the box in production with zero configuration since both containers run in the same pod - Webserver pushes health data on a periodic timer (e.g., every 5–10s, aligned with heartbeat frequency)
- Endpoint is unauthenticated — localhost-only binding is sufficient (in Kubernetes, pods share a network namespace so localhost is truly local)
- If the env variable is not set or empty, the webserver skips health reporting (graceful degradation for non-standard deployments)
- Indexer stores the latest webserver report in an atomic in-memory struct (no disk I/O)
- If the webserver hasn't reported within a timeout (e.g., 30s matching
ONLINE_DURATION_THRESHOLD), the indexer omits the webserver section from the heartbeatprocess_healthpayload — the absence itself is a signal of webserver failure
Metrics
For both processes:
| Metric | Source | Notes |
|---|---|---|
mmap_current |
/proc/self/maps (Linux) |
Already collected by webserver; add to indexer (see #593554 (closed)) |
mmap_max |
/proc/sys/vm/max_map_count (Linux) |
Already collected by webserver; add to indexer (see #593554 (closed)) |
restarts_1m, restarts_5m, restarts_15m |
Filesystem marker files (see #593553 (closed)) | Restart counts in sliding windows |
rss_bytes |
Go ProcessCollector (process_resident_memory_bytes) |
Already available via standard Go runtime |
uptime_seconds |
Computed from Go ProcessCollector (process_start_time_seconds) |
Trivially cheap to compute |
For the webserver additionally:
| Metric | Source | Notes |
|---|---|---|
shards_loaded |
zoekt_shards_loaded (upstream Prometheus gauge) |
Already registered by upstream zoekt search package |
Payload Structure
Webserver -> Indexer (localhost):
{
"mmap_current": 5678,
"mmap_max": 65530,
"restarts_1m": 0,
"restarts_5m": 0,
"restarts_15m": 1,
"rss_bytes": 536870912,
"uptime_seconds": 86400,
"shards_loaded": 142
}Indexer -> Rails (in heartbeat, combining both):
{
"process_health": {
"indexer": {
"mmap_current": 1234,
"mmap_max": 65530,
"restarts_1m": 0,
"restarts_5m": 1,
"restarts_15m": 2,
"rss_bytes": 268435456,
"uptime_seconds": 172800
},
"webserver": {
"mmap_current": 5678,
"mmap_max": 65530,
"restarts_1m": 0,
"restarts_5m": 0,
"restarts_15m": 1,
"rss_bytes": 536870912,
"uptime_seconds": 86400,
"shards_loaded": 142
}
}
}If the webserver hasn't reported recently, the webserver key is omitted entirely.
Edited by 🤖 GitLab Bot 🤖