Skip to content

chore(notifications): enhance metrics with action and artifacts

Context

This is a follow-up issue to Expose webhook notification metrics in Grafana (#828 - closed) in which we exposed the metrics that the notification system emits for webhook notifications.

Problem

We identified some gaps in the current implementation that we want to fill in eventually to improve the observability of the system.

Solution

We should aim to complete the following:

  • Consolidate Failures + Errors event types, perhaps into one. At the moment it is not easy to understand what each mean, so they should reflect an easier way to understand when the notification system fails.
  • Enhance the notification metrics by adding the following labels to the registry_notifications_events_total counter:
    • action=push|pull|delete
    • artifact=manifest|tag|blob
    • Any others?
  • Update grafana dashboard with the updated labels. Add registry webhook notifications dashboard (gitlab-com/runbooks!5164 - merged) can be used as example. this will be done as part of gitlab-com/runbooks#106 (closed)
Edited by Jaime Martinez