Skip to content

Increment generations of up to date storages only

Sami Hiltunen requested to merge smh-increment-only-up-to-date into master

Praefect currently increments the generation of the primary and the secondaries which are on the same generation as the primary. Importantly, Praefect doesn't check whether the primary node is on the latest generation before incrementing its generation. This can cause data loss after a failover if the new primary has accepted a write while there has been an in-flight write request still going to the old primary. While this likely is a rare occurrence, this might happen if minority of the Praefect nodes can communicate with the current primary, triggering a failover while some Praefect nodes can still write to the primary. As Praefect doesn't check whether the primary was fully up to date before setting its generation to the latest, it might skip over some generations it was missing. This can then lead to losing some already acknowledged writes as they get replicated over from the new primary node.

To avoid replicating over acknowledged writes, this MR modifies the query to only increment the generation of storages which are on the latest generation, not the same as the primary of the transaction. If the write wasn't persisted to any fully up to date node, an error is returned which lets the client know the write may not be persisted.

Historically the IncrementGeneration also created records in the repositories and the storage_repositories tables. Creations nowadays handled via another query in the CreateRepository method and IncrementGeneration is not called on any path that creates new repositories or replicas. As such, the new query doesn't create these records anymore.

Closes #2969 (closed)
Related to #3183 (closed) as this fixes the remaining problem of creating records in repositories table.

Merge request reports