Scale sidekiq HPA on queue size

Summary

We frequently have problems with sidekiq where it doesn't scale horizontally because the HPA is only specified for CPU usage. This doesn't work for memory bound workers.

We used custom metrics in the past to scale on quue length for pubsubbeat. This helps scale the number of pods depending on the amount of work that needs to be done.

Proposal

We should try and do the same thing for Sidekiq so that we horizontally scale-out not just on CPU but also on queue length.

Few things to consider:

  1. We need to upstream this change to the GitLab helmchart since that is where the HPA definition lives.
  2. Figure out if we can get Prometheus metrics available for the HPA in GKE.

Props to @mwasilewski-gitlab for suggesting this

Originating incidents

Edited by Jason Plum