Skip to content

Decouple health checking logic from sqlElector and NodeManager

Sami Hiltunen requested to merge smh-health-manager into master

NodeManager and the sqlElector have grown in responsibility to cover connection management, health checking, leader election and request routing. This prevents reuse of the logic and makes it difficult to test the independent pieces. As a first step in decoupling the independent functionalities from the NodeManager and the sqlElector, this commit splits out the health checking logic in to its own component.

The new HealthManager runs periodic health checks on the configured nodes. The health check results are recorded in the database and the cluster's health status is determined from a consensus of votes of all Praefect nodes in the cluster.

The only difference to sqlElector's health checking logic is that the success threshold of three consecutive health checks is dropped. A node is considered healthy if it has a successful health check in the last 10 seconds, which is also checked for in the current implementation. This simplifies the logic and ensures all Praefect's have an accurate idea of how each node views the health of a given Gitaly node.

ErrorTracker is not part of the health manager and will be implemented in a follow up behind the HealthClient interface as it doesn't have to be part of the core logic.

The HealthManager is not yet wired up for use and will be done in a follow-up commit.

This refactoring is preparation for #2971 (closed) in order to implement the per repository primary elector.

Merge request reports