Configure Horizontal Pod Autoscaling for pubsubbeat deployments based on PubSub metrics

Summary

Currently pubsubbeat is configured with a fixed number of replicas for each deployment: https://gitlab.com/gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles/-/blob/master/releases/pubsubbeat/gprd.yaml.gotmpl#L2-39

Because of this, pubsubbeat is sometimes saturated in periods of higher traffic than usual, or during/after deployments (migrations, deprecation warnings...) and PubSub messages are queuing up, with a backlog of tens of millions of unacked messages that can sometimes take a few hours to unpile. We increase the number of replicas to avoid those situation but it only pushes the problem away without solving it. And the rest of the time when traffic is low the pubsubbeat deployments are over-provisioned and wasting resources (#15187 (closed)).

It is possible in GKE to autoscale deployments based on PubSub metrics using the Custom Metrics Adapter (see https://cloud.google.com/kubernetes-engine/docs/tutorials/autoscaling-metrics#pubsub), we can use that to configure an HPA for pubsubbeat deployments instead of hardcoding their number of replicas, which would help avoiding both service saturation during high traffic and over-provisioning during low traffic.

Related Incident(s)

Originating issue(s):

Desired Outcome/Acceptance Criteria

pubsubbeat deployments have an HorizontalPodAutoscaler based on the number of unacknowledged messages remaining in the PubSub subscription each deployment is subscribed to.

Associated Services

ServiceLogging

Corrective Action Issue Checklist

Link the incident(s) this corrective action arose out of
Give context for what problem this corrective action is trying to prevent from re-occurring
Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4')
Assign a priority (this will default to 'priority::4')