Review PgBouncer Pool configuration and architecture
Related issue 1014, from July 31st.
It is being observed that the current Database Pool configuration may be exhausting current Datanode CPU and memory capacity. That is, the settings of the number of connections allowed from each node and pool can be potentially harmful to the leader's performance.
Right now, max_connections
in the cluster is set at 300, a conservative number yet very above what is the best theoretical connection/throughput relation peak. The way that we calculate the theoretical connection capacity is: (cores / % effective usage) * scale_factor
. Effective usage is the client busy percentage, which can be calculated around 95% as this is a queue processing; scale_factor is a coefficient between 2 and 4. Within this, the best suitable amount of maximum active connections would be around 130 and 200.
But, in the PgBouncer side, we currently have two nodes, each with 3 pools (2 intensive: production and sidekiq). Production pool is 50 and sidekiq is 75 on each node, meaning that there are potentially 250 active connections.
From OnGres we need to:
- Define the pool size on sidekiq and production for reducing the performance degradation occurrences that happened during the last days.
- Splitting pools can offer better resilience when one of the pools is generating waits that affect other queues in the node.
- Revisit other PgBouncer configuration. For instance:
- Current:
min_pool_size = 0
, recommendedmin_pool_size = 20
. This opens a minimum amount of persistent connections, decreasing any possible startup time when issuing new connections.
- Current:
- If the split is necessary, establish the necessary amount of nodes per each pool and its configuration.
Related graphs:
https://prometheus.gprd.gitlab.net/graph?g0.range_input=2w&g0.expr=sum(sidekiq_queue_size)%20by%20(fqdn)&g0.tab=0 https://dashboards.gitlab.com/d/9GOIu9Siz/sidekiq-stats?orgId=1&fullscreen&panelId=71&from=now-10d&to=now
IOWait on Pgbouncer https://dashboards.gitlab.com/d/PwlB97Jmk/pgbouncer-overview?from=1564563814425&to=1564571034282&fullscreen&panelId=6