Investigate Postgres active session spikes (LWLock) on replicas in the Main cluster
(moved from production#8156 (comment 1209576745))
We constantly have spikes in active sessions on all Postgres nodes in the "Main" cluster, here is the node patroni-main-2004--01, last 2 days (Thanos):
- The number of active sessions usually jumps to 200-300 (which is not good and should affect performance already – we have only 96 vCPUs),
- And sometimes even to ~400 (which is dangerous –
max_connections = 500) - It happens every day multiple times
- It happens on all replicas
- They are waiting on some LWLock, based on the Marginalia dashboard, but unfortunately, we don't see there which exactly lwlock and what queries contribute to it.
This is something that is definitely worth a separate analysis; pg_wait_sampling graphs (or at least having it installed) would definitely help with troubleshooting here.
If this remains unsolved, it can lead to performance degradation noticeable to all users of the Main cluster.
Edited by Nikolay Samokhvalov
