how to identify and measure database performance bottlenecks

Summary

A customer raised an emergency call with support (GitLab team members can read more in the tickets - 1, 2) as they were seeing very poor performance on their scaled out GitLab deployment.

The environment comprises

Rails / Sidekiq servers ->  PgBouncer servers -> Database server

All the builds except for the database server are Omnibus, which is why I've raised this in this project against groupdistribution. Tools and techniques for PostgreSQL itself would be applicable to an Omnibus PostgreSQL build, though in this specific case, it was a cloud platform's build.

Proposal

We need effective tools to identify bottlenecks in customer environments.

A non exhaustive list of topics is as follows, detailed in comment threads below:

References

Edited by Ben Prescott (ex-GitLab)