Add SQL-based election for shard primaries
This commit adds the following strategy to enable redundant Praefect nodes to run simultaneously:
-
Every Praefect node periodically (every second) performs a health check RPC with a Gitaly node.
-
For each node, Praefect updates a row in a new table (
node_status) with the following information:- The name of the Praefect instance (
praefect_name) - The name of the virtual storage name (
shard_name) - The name of the Gitaly storage name (
storage_name) - The timestamp of the last time Praefect tried to reach that node
(
last_contact_attempt_at) - The timestamp of the last successful health check (
last_seen_active_at)
- The name of the Praefect instance (
-
Periodically every Praefect node does a
SELECTfromnode_statusto determine healthy nodes. A healthy node is defined by:- A node that has a recent successful error check (e.g. one in the last 10 s).
- A majority of the available Praefect nodes have entries that match the two above.
-
To determine the majority, we use a lightweight service discovery protocol: a Praefect node is deemed a voting member if the
praefect_namehas a recentlast_contact_attempt_atin thenode_statustable. The name is derived from a combination of the hostname and listening port/socket. -
The primary of each shard is listed in the
shard_primaries. If the current primary is in the healthy node list, then no election needs to be done. -
Otherwise, if there is no primary or it is unhealthy, any Praefect node can elect a new primary by choosing candidate from the healthy node list and inserting a row into the table.