Corrective action: PostgreSQL_ExporterErrors on postgres-dr-main-delayed-2004-01-db-gprd

Summary

In order to fix production#14366 (closed) we need to opt between:

Option A

Install pg_stat_kcache and pg_wait_sampling extensions in our DR postgresql databases in GPRD

However, pg_wait_sampling are not included in the default postgresql package, and we only included these extensions in the gitlab-patroni cookbook (gitlab-cookbooks/gitlab-patroni!97 (merged)), but Gitlab.com postgresql DR nodes don't use gitlab-patroni they use gitlab-server only.

Blocker: very hard to implement as Omnibus doesn't use apt to install postgresql packages, so we apparently need to compile the extensions to add them in our Omnibus package, and we don't want to manage extensions config and updates.

Option B

Modify gitlab-exporters cookbook to look for extensions on queries that need them under postgres_exporter, so it can look "shared_preload_libraries" to see if it finds the pg_stat_kcache listed before calling a query.

If the postgresql service is up and an extension is on shared_preload_libraries, but a query fails it means that the extension was not created in a database (with CREATE EXTENSIO) and we would like to alert that as well, because these extensions should be created.

We are working on this option

Related Incident(s)

Originating issue(s): production#14366 (closed)

Desired Outcome/Acceptance Criteria

Associated Services

ServicePostgresArchive ServicePostgresDelayed

Corrective Action Issue Checklist

  • Link the incident(s) this corrective action arose out of
  • Give context for what problem this corrective action is trying to prevent from re-occurring
  • Assign a severity label (this is the highest sev of related incidents, defaults to 'severity::4')
  • Assign a priority (this will default to 'Reliability::P4')
Edited by Rafael Henchen