Zoekt: Extend heartbeat API to accept process health data
Summary
Extend the POST /internal/search/zoekt/:uuid/heartbeat endpoint to accept the new process_health data from the indexer, covering both the indexer's own health metrics and the relayed webserver health metrics. Add a webserver_last_seen_at column to the zoekt_nodes table.
Details
Heartbeat API Changes
Add new optional parameters to the heartbeat Grape API (backward compatible — old indexers without the new fields continue to work):
{
"process_health": {
"indexer": {
"mmap_current": 1234,
"mmap_max": 65530,
"restarts_1m": 0,
"restarts_5m": 1,
"restarts_15m": 2,
"rss_bytes": 268435456,
"uptime_seconds": 172800
},
"webserver": {
"mmap_current": 5678,
"mmap_max": 65530,
"restarts_1m": 0,
"restarts_5m": 0,
"restarts_15m": 1,
"rss_bytes": 536870912,
"uptime_seconds": 86400,
"shards_loaded": 142
}
}
}
Metrics Accepted
For both processes:
| Field | Type | Description |
|---|---|---|
mmap_current |
integer | Current memory-mapped region count (Linux only, 0 on other platforms) |
mmap_max |
integer | Max mmap limit from /proc/sys/vm/max_map_count
|
restarts_1m |
integer | Process restart count in the last 1 minute |
restarts_5m |
integer | Process restart count in the last 5 minutes |
restarts_15m |
integer | Process restart count in the last 15 minutes |
rss_bytes |
integer | Resident Set Size — physical memory usage |
uptime_seconds |
integer | Seconds since process start |
For webserver additionally:
| Field | Type | Description |
|---|---|---|
shards_loaded |
integer | Number of search index shards currently loaded |
Database Migration
Add webserver_last_seen_at column to zoekt_nodes:
- Type:
timestamptz, default: epoch (1970-01-01 00:00:00+00) - Add B-tree index (mirrors the existing
last_seen_atpattern) - Rails sets
webserver_last_seen_at = Time.zone.nowwhenever the heartbeat includes theprocess_health.webserverkey — this indicates the webserver is alive and reporting through the indexer relay
Storage
- The
webserver_last_seen_attimestamp is a top-level column for efficient scoping (e.g.,scope :webserver_online, -> { where(webserver_last_seen_at: THRESHOLD.ago..) }) - The rest of the process health data (mmap, restarts, rss, uptime, shards) is stored in the existing
metadataJSONB column under theprocess_healthkey — no migration needed for this part - Update
zoekt_node_metadata.jsonJSON schema to validate the newprocess_healthstructure
Edited by Dmitry Gruzd