-
- Downloads
coordinator: Only schedule replication for differing error states
When finalizing a transaction, we always schedule replication jobs in case the primary has returned an error. Given that there are many RPCs which are expected to return errors in a controlled way, e.g. if a commit is missing, this causes us to create replication in many contexts where it's not necessary at all. Thinking about the issue, what we really care for is not whether an RPC failed or not. It's that primary and secondary nodes behaved the same. If both primary and secondaries succeeded, we're good. But if both failed with the same error, then we're good to as long as all transactions have been committed: quorum was reached on all votes and nodes failed in the same way, so we can assume that nodes did indeed perform the same changes. This commit thus relaxes the error condition to not schedule replication jobs anymore in case the primary failed, but to only schedule replication jobs to any node which has a different error than the primary. This has both the advantage that we only need to selectively schedule jobs for disagreeing nodes instead of targeting all secondaries and it avoids scheduling jobs in many cases where we do hit errors. Changelog: performance
Showing
- internal/praefect/coordinator.go 8 additions, 13 deletionsinternal/praefect/coordinator.go
- internal/praefect/coordinator_pg_test.go 3 additions, 8 deletionsinternal/praefect/coordinator_pg_test.go
- internal/praefect/coordinator_test.go 71 additions, 4 deletionsinternal/praefect/coordinator_test.go