raft: Implement fundamental Raft cluster management CLI commands
For #6907 (closed)
This MR implements a complete CLI interface for managing and inspecting Gitaly Raft clusters. Operators currently have limited visibility into cluster topology, partition distribution, and health status when troubleshooting Raft-enabled Gitaly instances.
All data are not fetched from persistent routing table. These data are not very up-to-date. We'll add more data to gRPC cluster handlers to have more real-time data. This MR lays a foundation for future iteration.
What this solves
Operators managing Gitaly Raft clusters need tools to:
- View cluster-wide health and partition distribution
- Identify which storage nodes are serving as leaders for specific partitions
- Find which partition contains a specific repository
- Troubleshoot repository placement and replica distribution
- Monitor cluster health with clear visual indicators
Implementation approach
The solution provides two complementary CLI commands with different levels of detail:
-
gitaly cluster info- High-level cluster statistics and overview -
gitaly cluster get-partition- Detailed partition-specific information
Key technical changes
RPC layer restructuring
-
Renamed
GetClusterInfotoGetPartitions- The original name was misleading since this RPC streams detailed partition data rather than high-level cluster information. The rename clarifies its purpose. -
Added new unary
GetClusterInfoRPC - Separating cluster statistics from partition details improves performance for monitoring use cases that only need aggregate metrics. This avoids streaming potentially thousands of partition records when only summary data is needed. -
Extended
GetPartitionswith path filtering - Enables repository-to-partition mapping so operators can find which partition contains a specific repository without knowing the partition key.
CLI command structure
The implementation splits functionality into two commands for progressive information disclosure:
-
cluster infoshows cluster statistics, per-storage metrics, and optional partition overview -
cluster get-partitionprovides detailed partition topology, replica health, and repository listings
Usage Examples
Basic cluster overview
$ gitaly cluster info --config config.toml
=== Gitaly Cluster Information ===
=== Cluster Health Summary ===
Partitions: ✓ Healthy (2/2)
Replicas: ✓ Healthy (6/6)
=== Cluster Statistics ===
Total Partitions: 2
Total Replicas: 6
Healthy Partitions: 2
Healthy Replicas: 6
=== Per-Storage Statistics ===
STORAGE LEADER COUNT REPLICA COUNT
------- ------------ -------------
storage-1 1 2
storage-2 1 2
storage-3 0 2
Use --list-partitions to display partition overview table.
Cluster with partition overview
$ gitaly cluster info --config config.toml --list-partitions
=== Gitaly Cluster Information ===
=== Cluster Health Summary ===
Partitions: ✓ Healthy (2/2)
Replicas: ✓ Healthy (6/6)
=== Cluster Statistics ===
Total Partitions: 2
Total Replicas: 6
Healthy Partitions: 2
Healthy Replicas: 6
=== Per-Storage Statistics ===
STORAGE LEADER COUNT REPLICA COUNT
------- ------------ -------------
storage-1 1 2
storage-2 1 2
storage-3 0 2
=== Partition Overview ===
PARTITION KEY LEADER REPLICAS HEALTH LAST INDEX MATCH INDEX REPOSITORIES
------------- ------ -------- ------ ---------- ----------- ------------
1ae75994b13cfe1d19983e0d7eeac7b4a7077bd9c4a26e3421c1acd3d683a4ad storage-1 storage-1, storage-2, storage-3 3/3 100 100 1 repos
ae3928eb528786e728edb0583f06ec25d4d0f41f3ad6105a8c2777790d8cfc98 storage-2 storage-1, storage-2, storage-3 3/3 150 150 1 repos
Storage-specific filtering
$ gitaly cluster info --config config.toml --storage storage-1
=== Gitaly Cluster Information ===
=== Cluster Health Summary ===
Partitions: ✓ Healthy (2/2)
Replicas: ✓ Healthy (6/6)
=== Cluster Statistics ===
Total Partitions: 2
Total Replicas: 6
Healthy Partitions: 2
Healthy Replicas: 6
=== Per-Storage Statistics ===
STORAGE LEADER COUNT REPLICA COUNT
------- ------------ -------------
storage-1 1 2
=== Partition Overview ===
PARTITION KEY LEADER REPLICAS HEALTH LAST INDEX MATCH INDEX REPOSITORIES
------------- ------ -------- ------ ---------- ----------- ------------
1ae75994b13cfe1d19983e0d7eeac7b4a7077bd9c4a26e3421c1acd3d683a4ad storage-1 storage-1, storage-2, storage-3 (filtered: storage-1) 3/3 100 100 1 repos
ae3928eb528786e728edb0583f06ec25d4d0f41f3ad6105a8c2777790d8cfc98 storage-2 storage-1, storage-2, storage-3 (filtered: storage-1) 3/3 150 150 1 repos
Repository-to-partition mapping
$ gitaly cluster get-partition --config config.toml --relative-path @hashed/ab/cd/repo1.git
=== Partition Details for Repository: @hashed/ab/cd/repo1.git ===
Partition: 1ae75994b13cfe1d19983e0d7eeac7b4a7077bd9c4a26e3421c1acd3d683a4ad
STORAGE ROLE HEALTH LAST INDEX MATCH INDEX
------- ---- ------ ---------- -----------
storage-1 Leader Healthy 100 100
storage-2 Follower Healthy 100 100
storage-3 Follower Healthy 100 100
Repositories:
REPOSITORY PATH
---------------
@hashed/ab/cd/repo1.git
Direct partition lookup
$ gitaly cluster get-partition --config config.toml --partition-key 1ae75994b13cfe1d19983e0d7eeac7b4a7077bd9c4a26e3421c1acd3d683a4ad
=== Partition Details for Key: 1ae75994b13cfe1d19983e0d7eeac7b4a7077bd9c4a26e3421c1acd3d683a4ad ===
Partition: 1ae75994b13cfe1d19983e0d7eeac7b4a7077bd9c4a26e3421c1acd3d683a4ad
STORAGE ROLE HEALTH LAST INDEX MATCH INDEX
------- ---- ------ ---------- -----------
storage-1 Leader Healthy 100 100
storage-2 Follower Healthy 100 100
storage-3 Follower Healthy 100 100
Repositories:
REPOSITORY PATH
---------------
@hashed/ab/cd/repo1.git