Zoekt: Webserver-to-indexer process health relay

Summary

Add a new internal HTTP endpoint on the indexer (POST /indexer/internal/process_health) that the webserver can call on localhost to push its process health data. The indexer stores the latest webserver report in memory and includes it in the next heartbeat payload to Rails.

Repository: https://gitlab.com/gitlab-org/gitlab-zoekt-indexer

Details

  • Webserver discovers the indexer's address via a new env variable ZOEKT_INDEXER_INTERNAL_URL, defaulting to http://localhost:6065 — this matches the indexer port in the gitlab-zoekt Helm chart (values.yaml: indexer.listen.port: 6065), so it works out of the box in production with zero configuration since both containers run in the same pod
  • Webserver pushes health data on a periodic timer (e.g., every 5–10s, aligned with heartbeat frequency)
  • Endpoint is unauthenticated — localhost-only binding is sufficient (in Kubernetes, pods share a network namespace so localhost is truly local)
  • If the env variable is not set or empty, the webserver skips health reporting (graceful degradation for non-standard deployments)
  • Indexer stores the latest webserver report in an atomic in-memory struct (no disk I/O)
  • If the webserver hasn't reported within a timeout (e.g., 30s matching ONLINE_DURATION_THRESHOLD), the indexer omits the webserver section from the heartbeat process_health payload — the absence itself is a signal of webserver failure

Metrics

For both processes:

Metric Source Notes
mmap_current /proc/self/maps (Linux) Already collected by webserver; add to indexer (see #593554 (closed))
mmap_max /proc/sys/vm/max_map_count (Linux) Already collected by webserver; add to indexer (see #593554 (closed))
restarts_1m, restarts_5m, restarts_15m Filesystem marker files (see #593553 (closed)) Restart counts in sliding windows
rss_bytes Go ProcessCollector (process_resident_memory_bytes) Already available via standard Go runtime
uptime_seconds Computed from Go ProcessCollector (process_start_time_seconds) Trivially cheap to compute

For the webserver additionally:

Metric Source Notes
shards_loaded zoekt_shards_loaded (upstream Prometheus gauge) Already registered by upstream zoekt search package

Payload Structure

Webserver -> Indexer (localhost):

{
  "mmap_current": 5678,
  "mmap_max": 65530,
  "restarts_1m": 0,
  "restarts_5m": 0,
  "restarts_15m": 1,
  "rss_bytes": 536870912,
  "uptime_seconds": 86400,
  "shards_loaded": 142
}

Indexer -> Rails (in heartbeat, combining both):

{
  "process_health": {
    "indexer": {
      "mmap_current": 1234,
      "mmap_max": 65530,
      "restarts_1m": 0,
      "restarts_5m": 1,
      "restarts_15m": 2,
      "rss_bytes": 268435456,
      "uptime_seconds": 172800
    },
    "webserver": {
      "mmap_current": 5678,
      "mmap_max": 65530,
      "restarts_1m": 0,
      "restarts_5m": 0,
      "restarts_15m": 1,
      "rss_bytes": 536870912,
      "uptime_seconds": 86400,
      "shards_loaded": 142
    }
  }
}

If the webserver hasn't reported recently, the webserver key is omitted entirely.

Edited by 🤖 GitLab Bot 🤖