Scale sidekiq HPA on queue size
Summary
We frequently have problems with sidekiq where it doesn't scale horizontally because the HPA is only specified for CPU usage. This doesn't work for memory bound workers.
We used custom metrics in the past to scale on quue length for pubsubbeat. This helps scale the number of pods depending on the amount of work that needs to be done.
Proposal
We should try and do the same thing for Sidekiq so that we horizontally scale-out not just on CPU but also on queue length.
Few things to consider:
- We need to upstream this change to the GitLab helmchart since that is where the HPA definition lives.
- Figure out if we can get Prometheus metrics available for the HPA in GKE.
Props to @mwasilewski-gitlab for suggesting this
Originating incidents
Edited by Jason Plum