Zoekt: Extend heartbeat API to accept process health data

Summary

Extend the POST /internal/search/zoekt/:uuid/heartbeat endpoint to accept the new process_health data from the indexer, covering both the indexer's own health metrics and the relayed webserver health metrics. Add a webserver_last_seen_at column to the zoekt_nodes table.

Details

Heartbeat API Changes

Add new optional parameters to the heartbeat Grape API (backward compatible — old indexers without the new fields continue to work):

{
  "process_health": {
    "indexer": {
      "mmap_current": 1234,
      "mmap_max": 65530,
      "restarts_1m": 0,
      "restarts_5m": 1,
      "restarts_15m": 2,
      "rss_bytes": 268435456,
      "uptime_seconds": 172800
    },
    "webserver": {
      "mmap_current": 5678,
      "mmap_max": 65530,
      "restarts_1m": 0,
      "restarts_5m": 0,
      "restarts_15m": 1,
      "rss_bytes": 536870912,
      "uptime_seconds": 86400,
      "shards_loaded": 142
    }
  }
}

Metrics Accepted

For both processes:

Field Type Description
mmap_current integer Current memory-mapped region count (Linux only, 0 on other platforms)
mmap_max integer Max mmap limit from /proc/sys/vm/max_map_count
restarts_1m integer Process restart count in the last 1 minute
restarts_5m integer Process restart count in the last 5 minutes
restarts_15m integer Process restart count in the last 15 minutes
rss_bytes integer Resident Set Size — physical memory usage
uptime_seconds integer Seconds since process start

For webserver additionally:

Field Type Description
shards_loaded integer Number of search index shards currently loaded

Database Migration

Add webserver_last_seen_at column to zoekt_nodes:

  • Type: timestamptz, default: epoch (1970-01-01 00:00:00+00)
  • Add B-tree index (mirrors the existing last_seen_at pattern)
  • Rails sets webserver_last_seen_at = Time.zone.now whenever the heartbeat includes the process_health.webserver key — this indicates the webserver is alive and reporting through the indexer relay

Storage

  • The webserver_last_seen_at timestamp is a top-level column for efficient scoping (e.g., scope :webserver_online, -> { where(webserver_last_seen_at: THRESHOLD.ago..) })
  • The rest of the process health data (mmap, restarts, rss, uptime, shards) is stored in the existing metadata JSONB column under the process_health key — no migration needed for this part
  • Update zoekt_node_metadata.json JSON schema to validate the new process_health structure
Edited by Dmitry Gruzd