Skip to content

raft: Add auto compaction support

In the first iteration (#6303 (closed)), the Raft manager skips snapshotting (a part of auto compaction) completely.

In theory, the application creates "snapshots" occasionally. A snapshot is a point-of-time persistent state of the state machine. After a snapshot is created, the application can remove all log entries til the point where the snapshot is created. A new joining node loads the latest snapshot and replays WAL entries from there.

We skipped snapshotting in the first iteration because the current node acts as the only primary for a single-node Gitaly cluster. It hasn't replicated log entries anywhere else. In some next iterations, this won't work anymore.

Fortunately, Gitaly's WAL snapshotting is surprisingly compatible with Raft. Gitaly has an efficient snapshotting system based on hard-linking. This system supports creating a point-of-time snapshot of a partition instantly. This works extremely well with Raft auto compaction. We can create and keep the latest snapshot at the latest committed LSN. No need for manual snapshot creation. In fact, any new transactions depend on the latest snapshot at the time it begins.

A snapshot is materialized and sent over the network only if somebody requests it. After the snapshot is requested, the Raft manager creates a snapshot at committedLSN and removes it when done.

This issue includes two parts:

  • Update snapshot manager to facilitate the process.
  • Materialize and send a snapshot over the network on-demand.
Edited by Emily Chui
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information