Add a pagerduty prometheus exporter
Part of our OKR for Q3 will be to start monitoring the number of outstanding pages to create an escalation when there are multiple simultaneous open incidents.
We can information we already have for alerts, but I think we should probably source the information directly from PD since pages can be created outside of alertmanager notifications.
-
Create a PD API key and add it to 1pass -
Create a helmchart charts!239 (merged) -
Create secrets https://ops.gitlab.net/gitlab-com/gl-infra/config-mgmt/-/merge_requests/7103 -
Add the PD API key to vault https://vault.gitlab.net/ui/vault/secrets/k8s/kv/ops-gitlab-gke%2Fpagerduty-exporter%2Fauth/details?version=1 -
Create a helmfiles deployment in the ops cluster gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!3475 (merged) - Apply failure, reverting in gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!3477 (merged)
- Fix key path gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!3504 (merged)
- Add pod monitors charts!240 (merged)
- version bump gitlab-com/gl-infra/k8s-workloads/gitlab-helmfiles!3506 (merged)
-
Confirm we have PD metrics
Edited by John Jarvis