WIP: Gitaly Cluster database design review

Problem to solve

Gitaly Cluster provides improved fault tolerant, performance and scalability for Git repository storage by allowing multiple Gitaly nodes to store hot replicas of Git repositories.

Gitaly Cluster requires a high availability data store that is used for managing write transactions across the cluster, async replication when quorum but not consensus is reached, and monitoring the health of the each Gitaly node.

Currently this is a separate database to the main GitLab application database.

This issue is to formally confirm the design decisions made prior to the process documented in https://about.gitlab.com/handbook/engineering/development/enablement/database/doc/strategy.html#process-for-proposing-a-separate-database

Further details

Presently all Gitaly nodes within a cluster contain identical repositories, but in the future we plan to allow more elastic scaling and automatic rebalancing with zero downtime. This means that Gitaly Cluster will promise a specific replication factor, but which nodes contains the Git repostiory will varying based on load.

The Gitaly Cluster database was created as a separate database because:

the cluster state stored in the database is the state of the physical cluster. It should not be replicated to any Geo replicas.
Managing the database from a different code base is impractical, and create defuse responsibilities.

This database can run in the same postgresql cluster as the gitlabhq_production database.

Proposal

The Gitaly Cluser database should remain outside the GitLab application database.

Edited Jun 02, 2020 by Zeger-Jan van de Weg