add alerts to detect for query optimiser issues in postgres
Adds an alert to detect the type of incident we saw in gitlab-com/gl-infra/production#2885 (closed) and gitlab-com/gl-infra/production#3875 (closed).
Also adds a dashboard with visualizations which may help in diagnosing the problem. The alert will include a link to dashboard, with template vars set to the appropriate results.
The logic for this alert is as follows:
- If more than 50% of tuple fetches on a the primary instance are for a single table for more than 3 minutes, alert
- If more than 50% of tuple fetches on all the replica instances aggregated together are for a single table for more than 5 minutes, alert.
Edited by Andrew Newdigate
