Intermittent high hook-side delay for secondary in transactions
When sifting through Grafana dashboards for the hook-side delay introduced by voting in transactons, every now and then there's a spike from the usual 1ms to 10s or 30s:
There doesn't seem to be a clear pattern, but most notably the delay is only for the secondary node (blue dots). The primary (purple dots) always complete in less than a millisecond.
Given that the secondary currently often isn't taking part in transactions due to a mismatch in repository generations, it is calling out to Praefect a lot less frequent. My current theory is thus that the pooled connection has timed out or that connection tracking has lost track of the connection, causing the RPC call to hit a timeout. While we do have keepalive pings enabled for these connections, there's sometimes multiple hours of inactivity which may cause it to be dropped anyway.