Sidekiq Redis experiment: reduce number of clients by X%
Background
This is an experiment extracted from #956 (closed). We have these factors playing into our problems with CPU usage on our Redis instance for Sidekiq, but we don't know the weightings of them:
- Number of clients performing BRPOP with ...
- ... a very long argument list (for the catchall shard) where ...
- ... some of those arguments represent frequently-used lists (Sidekiq queues).
Experiment
This is to simulate &423 by acting as if X% of our Sidekiq workload was happening in a zonal cluster and so hitting a different Redis instance.
If item 1 is a significant factor, then we should see improvements here.
Summary Results
Redis CPU usage (%) for each set of worker counts (100% being a 'full' workload based on production, with data from #956 (closed))
| Base/Idle | 1 Generator | 2 Generators | 3 Generators | Notes | |
|---|---|---|---|---|---|
| Worker % | |||||
| 100% | 11% | 67% | 87% | 95% | From #956 (closed) |
| 66% | 8% | 60% | 86% | 92% | |
| 50% | 6% | 61% | 63% | 70% | |
| 40% | 5% | 60% | 55% | N/A | |
| 33% | 4% | 50% | 52% | N/A | |
| 25% | 3% | 36% | N/A | N/A |
Conclusions
The reduction in usage is not linear on the number of workers.
If we make some stretchy assumptions that we split our workload into 3 clusters (&423) and that for safety we initially size each at 50% of the original cluster (number of workers), then we will likely see a drop to perhaps 60-65% (absolute) Redis CPU usage. At that scale the number of workers appears to have more than the rate of jobs being pushed into the queues, which is an interesting result.