how to identify and measure database performance bottlenecks
Summary
A customer raised an emergency call with support (GitLab team members can read more in the tickets - 1, 2) as they were seeing very poor performance on their scaled out GitLab deployment.
The environment comprises
Rails / Sidekiq servers -> PgBouncer servers -> Database server
All the builds except for the database server are Omnibus, which is why I've raised this in this project against groupdistribution. Tools and techniques for PostgreSQL itself would be applicable to an Omnibus PostgreSQL build, though in this specific case, it was a cloud platform's build.
Proposal
We need effective tools to identify bottlenecks in customer environments.
A non exhaustive list of topics is as follows, detailed in comment threads below:
- Database query times (from Rails' perspective)
- TCP connections - Rails to PgBouncer
- PgBouncer throughput
- TCP connections - PgBouncer to PostgreSQL
- PostgreSQL