Add Sidekiq supervision tree for scale adjustment
Description
We have been having issues with Sidekiq queues for quite a while, from the infrastructure team we have developed a way to scale Sidekiq up by spawning a tmux session where we run Sidekiq manually.
We have been dealing with this until we finally got to the point where we reached a balance of manual Sidekiq runners, if I remember correctly we have been running something like this:
- 3 tmux sessions processing gitlab_shell with 2 threads each.
- 1 tmux session processing pipeline and project_cache queues with 2 threads each.
- 1 tmux session processing project_service, post_receive and build with 3 threads each.
This has proven that at our current scale the standard Sidekiq just does not work. We also tried scaling sidekiq up to 50 threads per worker with no luck before.
This is how queues look like with all the runners keeping queues down:
Given this data, I think we need to have better tooling to deal with this, and I also think that large customers can benefit from this if used together with gitlab-monitor
Proposal
We should have a way of scaling Sidekiq worker processes up and down from the application itself.
The way I picture this is by having some form of process supervisor where we can spawn new processes scaling queues up and down, even if manually, to add more capacity for when we actually need to have more processing power.
The idea is to have just have a screen where we can add a queue name (or names) and the number of threads we want to have it running (possibly just allowing to pass arbitrary arguments would be good enough) and this will spawn as many processes as we request for the queues we need to have running specifically.
I know this is scary and hard, so I could also take a simpler solution, what I would like to stop doing is running sidekiq from tmux in production and just get it inside the application itself.
