Revisit Redis Sidekiq scalability
We recently experienced redis-sidekiq
saturation during an incident production#16348 (comment 1555170289), and we've seen an increasing trend in CPU utilization.
For this reason it's worth revisiting the topic of Redis Sidekiq scalability. This was previously explored by @qmnguyen0711 in #1406 (closed), and we had more or less settled on Sidekiq Zonal Clusters as the scaling approach.
Quite a few things have changed since that analysis:
- We rolled out Redis Cluster for many workloads, and our approach here is more long-term focussed.
- We moved several workloads from redis-sidekiq elsewhere, e.g. &431 (closed).
- The application as a whole has become a year older.
Thus it makes sense to check our assumptions and decide if that is still the best way forward, and also see if any additional options are available to us.
A few questions to get things started:
- What is the urgency for scaling Redis Sidekiq?
- Has the complexity of implementation changed?
- How do we rate potential options in terms of: Risk, lead time, long-term scaling (e.g. ability to scale 10x)?