redis-registry-cache experiences latency bursts triggering apdex violations

Summary

Redis leader node became temporarily NodeNotReady status and caused Apdex SLO violation. Preliminary analysis showed that there was a calico pod restart detected on the node but found no evidence from logs/events suggestion that vm was migrated/moved.

Need to investigate this further to find out root cause and if its something under our control.

Related Incident(s)

production#7905 (closed)
production#7918 (closed)
production#7919 (closed)

Edited Oct 25, 2022 by Igor