Set up Redis/Sidekiq test environment
We want to pre-validate our logical deductions and guesses about how Redis will behave under various reconfigurations of Sidekiq, so that we can confidently choose the best option(s) from the available choices to ensure ongoing scalability of Redis for gitlab.com.
Test environment
We will set up a single Redis instance (replication is not believed to be relevant) sized the same as in production, and one or more client nodes which will run either Sidekiq itself or a mock client that performs similar Redis actions; if it's Sidekiq itself it will run simple 'sleep' jobs so that we can simulate a high number of workers on limited resources. Initial validation will be that when replicating production job volumes, Redis CPU is behaving similarly to how it does in production. If not, we can always adjust the workload (most likely up, not down) to get it to the a suitable level. Complete accuracy is not required, as long as it is sufficiently busy that we can easily detect results of changes.
Initial deployment will be Redis 5.0.9 (current production); experiments should be repeated with Redis 6.
Experiments
We have several factors that are in play in the CPU usage we see from BRPOP, and we currently don't know how much weight to give to each of them:
- Number of clients performing BRPOP with ...
- ... a very long argument list (for the catchall shard) where ...
- ... some of those arguments represent frequently-used lists (Sidekiq queues).
We may add experiments, but only when we need to test a specific proposal. For example we may wish to test a patched Redis 6 with a workaround for the brpop regression, if such becomes available in time and we do not gain sufficient from other changes first.
- Split catchall by volume: #959 (closed)
- Split catchall by number of queues: #960 (closed)
- Have a single queue per shard: #961 (closed) - mostly completed, awaiting any final feedback
- Reduce number of clients per X%: #962 (closed)