feat: telemetry reporter for monitoring instances

Implements the reporter side of https://gitlab.com/postgres-ai/infra/-/work_items/52 — a small TS+Bun service that runs on each PostgresAI monitoring host and posts an hourly mini-healthcheck to the platform.

Summary

New top-level telemetry/ directory with:

  • Collectors (each split into a pure parser + a thin runner so tests don't need root or docker):
    • oom: OOM kernel events from journalctl -k --since "<lookback>"
    • containers: faulty (exited / dead / restarting / unhealthy) docker containers via docker ps -a --format '{{json .}}'
    • memory: free RAM from /proc/meminfo MemAvailable, fallback to MemFree on very old kernels
    • disk: free bytes for the configured mount via fs.statfs
  • Reporter: POSTs to <platform>/rpc/monitoring_instance_telemetry_report with the existing PostgresAI API token format
  • Entry script: runs once on startup, then every PGAI_TELEMETRY_INTERVAL_SEC (default 3600s), with clean SIGTERM/SIGINT shutdown
  • Dockerfile mounts /proc and a host disk path read-only

Configuration

env var required default
PGAI_PLATFORM_API_URL yes
PGAI_API_TOKEN yes
PGAI_MONITORING_INSTANCE_ID yes
PGAI_TELEMETRY_DISK_PATH no /
PGAI_TELEMETRY_MEMINFO_PATH no /proc/meminfo
PGAI_TELEMETRY_OOM_LOOKBACK no 24 hours ago
PGAI_TELEMETRY_INTERVAL_SEC no 3600 (min 60)

Tests (TDD)

39 pass across 7 files, tsc --noEmit clean.

CI

telemetry:tests job in .gitlab-ci.yml, path-scoped to telemetry/**.

Companion

The platform-side hypertable, insert RPC, health evaluator, and alert dispatcher are in postgres-ai/platform-all!365.

Test plan

  • bun test — 39/39 pass
  • bun run typecheck — clean
  • After companion MR merges and platform settings are configured: deploy one reporter container, observe at least one successful report in monitoring_instance_telemetry_latest
  • Verify alert path: stop a container by hand and confirm Slack/email fires within the cron sweep window

Related: https://gitlab.com/postgres-ai/infra/-/work_items/52 (along with postgres-ai/platform-all!365)

🤖 Generated with Claude Code

Edited by Nikolay Samokhvalov

Merge request reports

Loading