Sidekiq Redis experiment: split catchall by volume
Background
This is an experiment extracted from #956 (closed). We have these factors playing into our problems with CPU usage on our Redis instance for Sidekiq, but we don't know the weightings of them:
- Number of clients performing BRPOP with ...
- ... a very long argument list (for the catchall shard) where ...
- ... some of those arguments represent frequently-used lists (Sidekiq queues).
Experiment
https://log.gprd.gitlab.net/goto/1493471c48275132c8cfe0ea11983607 shows that the top 6 queues on catchall perform over 50% of the jobs by volume. Those queues are:
update_namespace_statistics:namespaces_schedule_aggregationweb_hookproject_import_schedulerepository_update_mirrorpipeline_background:ci_build_trace_chunk_flushprojects_git_garbage_collect
If we moved those to a hypothetical 'catchsome' shard, we could give ourselves another small BRPOP list, and reduce the queue volume in the very long BRPOP list on the remaining catchall shard. If factor 3 above is a big factor, this might help.
Results
Conclusions
This has a larger effect than simply splitting the queues into two sets, unless the experimental changes (multiple workers chosen randomly, for each shard) has had some effect. But I think the results are explicable by the dequeue book-keeping time in Redis being much reduced (only having 6 queues), for 50% of the catchall work load.