Investigate `PubSub queueing high` incidents
The last couple of months we've seen recurring `PubSub queueing high` incidents:
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7787
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7795
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7771
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7569
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7384
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7238
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7805
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7921
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/7990
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8052
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/8053
- https://gitlab.com/gitlab-com/gl-infra/production/-/issues/15969
Some of them we've linked to specific events, but for the rest investigation by the EOC hasn't encountered a root cause, and the related runbooks currently don't include many remediation paths to apply, so the actual strategy is to let ES eventually churn through the load spikes and self-recover, but this can take a couple of hours. We should invest some time into trying to find an underlying pattern to these events.
issue