Separate pgbouncer connection pools for latency sensitive Sidekiq workloads

Right now, we only have separate connection pools for Puma and for Sidekiq. In the case of Sidekiq, this means that a work on a Shard with a lot of concurrency could use a lot of these connections simultaneously. This could then easily starve a shard with lower concurrency. Causing incidents like production#6880.

But some of the workloads in Sidekiq are also latency sensitive: the urgent shards process jobs that users might be waiting for. If these jobs slows down, this might manifest in pushes not showing up in the interface, CI jobs not starting.

We should create a new connection pool for these urgent shards. We should provision it as such that the connection pool there is never saturated. If saturation for connection pools does occur, it should happen on the Sidekiq shards that are processing the less latency sensitive jobs.

Closing summary

We have separated the pools in pgbouncer-sidekiq and pgbouncer-sidekiq-ci. We also introduced a new set of K8s releases for the urgent shards to configure the database name for the new pgbouncer pool (#3424 (closed)).

The CR for the rollout is at production#18674. On pgbouncer/pgbouncer-ci dashboards, we now can track the new gitlabhq_production_sidekiq_urgent pool.

source

Edited Oct 29, 2024 by Sylvester Chin