Skip to content

POC: Handle WriteRef in praefect with strong consistency

John Cai requested to merge jc-writeref-strong-consistency into master

This takes a naive approach to strong consistency where we don't have to do a full 3pc approach. The idea is the following:

  1. praefect does an update-ref to the primary. If it succeeds, then we will return success to the client. If it fails, then we return an error the client.
  2. if the write to the primary succeeds, praefect will attempt to relay the same update-ref call to all of the nodes. If any fail, then praefect will schedule a replication job so that the bad node can catch up.

The benefit of this approach is both simplicity, but it also takes advantage of update-ref being an atomic update, and so the primary will always be up to date and even if there is a race condition to the internal gitaly nodes, replication should get us up to date.

in this sequence diagram, gitaly_node_1 is the primary

sequenceDiagram
  GitalyClient->>Praefect1: WriteRef('refs/heads/branchA', 'cafe', 'dead')
  GitalyClient->>Praefect2: WriteRef('refs/heads/branchA', 'beef', 'cafe')
  Praefect1->>gitaly_node_1: WriteRef('refs/heads/branchA', 'cafe', 'dead')
  gitaly_node_1->>Praefect1: OK
  Praefect1->>gitaly_node_2: WriteRef('refs/heads/branchA', 'cafe', 'dead')
  gitaly_node_2->>Praefect1: OK
  Praefect2->>gitaly_node_3: WriteRef('refs/heads/branchA', 'beef', 'cafe')
  gitaly_node_3->>Praefect2: Error (since refs/heads/branchA is still at dead, not cafe)
  Praefect1->>gitaly_node_3: WriteRef('refs/heads/branchA', 'cafe', 'dead')
  gitaly_node_3->>Praefect1: OK
  Praefect2->>gitaly_node_1: WriteRef('refs/heads/branchA', 'beef', 'cafe')
  gitaly_node_1->>Praefect2: OK
  Praefect2->>gitaly_node_2: WriteRef('refs/heads/branchA', 'beef', 'cafe')
  gitaly_node_1->>Praefect2: OK
  Praefect1->>GitalyClient: OK
  Praefect2->>GitalyClient: Error

In this scenario, Praefect2 should succeed but because its request to the internal gitaly node beats Praefect1's request, it fails. However, the primary is still up to date, which means the secondaries will catch up via replication.

fixes: #2528 (closed)

Edited by John Cai

Merge request reports