Skip to content

raft: Add GetRaftClusterInfo RPC for cluster monitoring and debugging

During Raft cluster operations and troubleshooting, there's currently no straightforward way to inspect the cluster's current state. When debugging issues like partition leadership problems, replica health, or routing table inconsistencies, operators need to dig through logs or query internal databases directly.

We need a unified RPC that exposes cluster topology information including:

  • Partition distribution across storages
  • Current leadership status for each Raft group
  • Replica health and synchronization state
  • Routing table entries and cluster metadata

This would enable better monitoring, faster debugging, and more reliable cluster health checks. The RPC should integrate with the existing RaftService and follow patterns from admin RPCs like ServerInfo.

Edited by Quang-Minh Nguyen
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information