Expose webhook notification metrics in Grafana
Context
Related to &8628 (closed). Followup from &8628 (comment 1162358305).
Partially related to #496.
Problem
We have zero visibility over the webhook notifications feature reliability/performance on GitLab.com.
We need visibility to assess how well/reliably it performs over time and use that to decide how urgent it's to work on Webhook notifications with at-least-once delive... (&9161).
Solution
The registry is already emitting metrics around this (source). Looking at Thanos we can see registry_notifications_*
metrics there. However, these are not collected/exposed.
Tasks
-
Enumerate and document the existing Prometheus metrics for webhook notifications in a new document at docs-gitlab/metrics.md
. This will be the first step towards #496 as well; -
Create new graphs on the Grafana Application detail dashboard for all metrics identified above. Place them under a new Webhook notifications
row/section; -
Identify observability gaps (are we missing any metrics to fully understand the reliability/performance of this feature?), if any, and raise followup issues.
Edited by Jaime Martinez