[Feature flag] Enable synchronizing votes for hooks
What
Enable the :gitaly_synchronize_hook_executions
feature flag, which will cause Gitaly Cluster to synchronize the execution of hooks on primary and secondary nodes.
As secondary nodes never execute hook logic, they will forge ahead with executing the post-hook logic even though the primary is still busy executing the hook logic. While this tends to work alright, it does create problems with lock-contention around the packed-refs
file as the secondaries may have already locked the file while the primary is stuck authenticating the changes in the hook. Thus, it can happen that the file stays locked for many seconds which may then cause concurrent RPCs to time out while trying to acquire the lock.
By synchronizing hook execution across nodes we make sure that secondaries will wait for the primary to have executed the hook logic before any references are locked. This fixes the lock contention and should thus cause us to see significantly less errors in reference-deleting RPCs like DeleteRefs
in large high-activity repositories.
Rolls out Investigate whether we can avoid locking `packe... (#5353 - closed).
Owners
- Team: Gitaly
- Most appropriate slack channel to reach out to:
#g_gitaly
- Best individual to reach out to:
pks-t
Expectations
What release does this feature occur in first?
What are we expecting to happen?
No user-observable change in behaviour should occur during normal operations. We should see a decrease in errors when trying to lock the packed-refs
file.
What might happen if this goes wrong?
Voting may start to fail, which can cause transactions and thus mutating changes to stop working in RPCs that use the hook infrastructure. Most importantly, this includes most of the RPCs in the Operations service.
What can we monitor to detect problems with this?
Roll Out Steps
-
Enable on staging -
Is the required code deployed on staging? (howto) -
Enable on staging (howto) -
Add featureflagstaging to this issue (howto) -
Test on staging (howto) -
Verify the feature flag was used by checking Prometheus metric gitaly_feature_flag_checks_total
-
-
Enable on production -
Is the required code deployed on production? (howto) -
Progressively enable in production (howto) -
Add featureflagproduction to this issue -
Verify the feature flag was used by checking Prometheus metric gitaly_feature_flag_checks_total
-
-
Default-enable the feature flag (optional, only required if backwards-compatibility concerns exist) -
Wait for release containg default-disabled feature flag. -
Change the feature flag to default-enabled (howto) -
Wait for release containing default-enabled feature flag.
-
-
Remove feature flag
Please refer to the documentation of feature flags for further information.