Skip to content

raft: Implement fundamental Raft cluster management CLI commands

For #6907 (closed)

This MR implements a complete CLI interface for managing and inspecting Gitaly Raft clusters. Operators currently have limited visibility into cluster topology, partition distribution, and health status when troubleshooting Raft-enabled Gitaly instances.

All data are not fetched from persistent routing table. These data are not very up-to-date. We'll add more data to gRPC cluster handlers to have more real-time data. This MR lays a foundation for future iteration.

What this solves

Operators managing Gitaly Raft clusters need tools to:

  • View cluster-wide health and partition distribution
  • Identify which storage nodes are serving as leaders for specific partitions
  • Find which partition contains a specific repository
  • Troubleshoot repository placement and replica distribution
  • Monitor cluster health with clear visual indicators

Implementation approach

The solution provides two complementary CLI commands with different levels of detail:

  1. gitaly cluster info - High-level cluster statistics and overview
  2. gitaly cluster get-partition - Detailed partition-specific information

Key technical changes

RPC layer restructuring

  1. Renamed GetClusterInfo to GetPartitions - The original name was misleading since this RPC streams detailed partition data rather than high-level cluster information. The rename clarifies its purpose.

  2. Added new unary GetClusterInfo RPC - Separating cluster statistics from partition details improves performance for monitoring use cases that only need aggregate metrics. This avoids streaming potentially thousands of partition records when only summary data is needed.

  3. Extended GetPartitions with path filtering - Enables repository-to-partition mapping so operators can find which partition contains a specific repository without knowing the partition key.

CLI command structure

The implementation splits functionality into two commands for progressive information disclosure:

  • cluster info shows cluster statistics, per-storage metrics, and optional partition overview
  • cluster get-partition provides detailed partition topology, replica health, and repository listings

Usage Examples

Basic cluster overview

$ gitaly cluster info --config config.toml
=== Gitaly Cluster Information ===

=== Cluster Health Summary ===

  Partitions: ✓ Healthy (2/2)
  Replicas: ✓ Healthy (6/6)

=== Cluster Statistics ===
  Total Partitions: 2
  Total Replicas: 6
  Healthy Partitions: 2
  Healthy Replicas: 6

=== Per-Storage Statistics ===

STORAGE    LEADER COUNT  REPLICA COUNT
-------    ------------  -------------
storage-1  1             2
storage-2  1             2
storage-3  0             2

Use --list-partitions to display partition overview table.

Cluster with partition overview

$ gitaly cluster info --config config.toml --list-partitions
=== Gitaly Cluster Information ===

=== Cluster Health Summary ===

  Partitions: ✓ Healthy (2/2)
  Replicas: ✓ Healthy (6/6)

=== Cluster Statistics ===
  Total Partitions: 2
  Total Replicas: 6
  Healthy Partitions: 2
  Healthy Replicas: 6

=== Per-Storage Statistics ===

STORAGE    LEADER COUNT  REPLICA COUNT
-------    ------------  -------------
storage-1  1             2
storage-2  1             2
storage-3  0             2

=== Partition Overview ===

PARTITION KEY                                                     LEADER     REPLICAS                         HEALTH  LAST INDEX  MATCH INDEX  REPOSITORIES
-------------                                                     ------     --------                         ------  ----------  -----------  ------------
1ae75994b13cfe1d19983e0d7eeac7b4a7077bd9c4a26e3421c1acd3d683a4ad  storage-1  storage-1, storage-2, storage-3  3/3     100         100          1 repos
ae3928eb528786e728edb0583f06ec25d4d0f41f3ad6105a8c2777790d8cfc98  storage-2  storage-1, storage-2, storage-3  3/3     150         150          1 repos

Storage-specific filtering

$ gitaly cluster info --config config.toml --storage storage-1
=== Gitaly Cluster Information ===

=== Cluster Health Summary ===

  Partitions: ✓ Healthy (2/2)
  Replicas: ✓ Healthy (6/6)

=== Cluster Statistics ===
  Total Partitions: 2
  Total Replicas: 6
  Healthy Partitions: 2
  Healthy Replicas: 6

=== Per-Storage Statistics ===

STORAGE    LEADER COUNT  REPLICA COUNT
-------    ------------  -------------
storage-1  1             2

=== Partition Overview ===

PARTITION KEY                                                     LEADER     REPLICAS                                               HEALTH  LAST INDEX  MATCH INDEX  REPOSITORIES
-------------                                                     ------     --------                                               ------  ----------  -----------  ------------
1ae75994b13cfe1d19983e0d7eeac7b4a7077bd9c4a26e3421c1acd3d683a4ad  storage-1  storage-1, storage-2, storage-3 (filtered: storage-1)  3/3     100         100          1 repos
ae3928eb528786e728edb0583f06ec25d4d0f41f3ad6105a8c2777790d8cfc98  storage-2  storage-1, storage-2, storage-3 (filtered: storage-1)  3/3     150         150          1 repos

Repository-to-partition mapping

$ gitaly cluster get-partition --config config.toml --relative-path @hashed/ab/cd/repo1.git
=== Partition Details for Repository: @hashed/ab/cd/repo1.git ===

Partition: 1ae75994b13cfe1d19983e0d7eeac7b4a7077bd9c4a26e3421c1acd3d683a4ad

STORAGE    ROLE      HEALTH   LAST INDEX  MATCH INDEX
-------    ----      ------   ----------  -----------
storage-1  Leader    Healthy  100         100
storage-2  Follower  Healthy  100         100
storage-3  Follower  Healthy  100         100

Repositories:

REPOSITORY PATH
---------------
@hashed/ab/cd/repo1.git

Direct partition lookup

$ gitaly cluster get-partition --config config.toml --partition-key 1ae75994b13cfe1d19983e0d7eeac7b4a7077bd9c4a26e3421c1acd3d683a4ad
=== Partition Details for Key: 1ae75994b13cfe1d19983e0d7eeac7b4a7077bd9c4a26e3421c1acd3d683a4ad ===

Partition: 1ae75994b13cfe1d19983e0d7eeac7b4a7077bd9c4a26e3421c1acd3d683a4ad

STORAGE    ROLE      HEALTH   LAST INDEX  MATCH INDEX
-------    ----      ------   ----------  -----------
storage-1  Leader    Healthy  100         100
storage-2  Follower  Healthy  100         100
storage-3  Follower  Healthy  100         100

Repositories:

REPOSITORY PATH
---------------
@hashed/ab/cd/repo1.git
Edited by Quang-Minh Nguyen

Merge request reports

Loading