Skip to content

refactor(notifications): consolidate errors and failures metrics into one

Jaime Martinez requested to merge 851-notif-metric-errors into master

As discussed in chore(notifications): enhance metrics with acti... (#851 - closed), we want to split event Failures and Errors into separate metrics, i.e. any errors that happen on the registry side (internal errors) and any bad responses we get from the endpoints (response codes != 200)

Sample notification metrics before and after:

Before:

# HELP registry_notifications_events_total The number of total events
# TYPE registry_notifications_events_total counter
registry_notifications_events_total{type="Events"} 10
registry_notifications_events_total{type="Failures"} 13
registry_notifications_events_total{type="Successes"} 10
# HELP registry_notifications_pending_total The gauge of pending events in queue
# TYPE registry_notifications_pending_total gauge
registry_notifications_pending_total 0
# HELP registry_notifications_status_total The number of status code
# TYPE registry_notifications_status_total counter
registry_notifications_status_total{code="200 OK"} 4
registry_notifications_status_total{code="202 Accepted"} 6
registry_notifications_status_total{code="400 Bad Request"} 3
registry_notifications_status_total{code="401 Unauthorized"} 10
  • Errors happened less often (unmarshalling problems or bad body responses) so it's harder to reproduce for these tests.

After:

# HELP registry_notifications_errors_total The number of events that were not sent due to internal errors
# TYPE registry_notifications_errors_total counter
registry_notifications_errors_total 17
# HELP registry_notifications_events_total The number of total events
# TYPE registry_notifications_events_total counter
registry_notifications_events_total{type="Events"} 6
registry_notifications_events_total{type="Failures"} 3
registry_notifications_events_total{type="Successes"} 6
# HELP registry_notifications_pending_total The gauge of pending events in queue
# TYPE registry_notifications_pending_total gauge
registry_notifications_pending_total 0
# HELP registry_notifications_status_total The number of status code
# TYPE registry_notifications_status_total counter
registry_notifications_status_total{code="200 OK"} 3
registry_notifications_status_total{code="202 Accepted"} 3
registry_notifications_status_total{code="400 Bad Request"} 3

Related to #851 (closed)

Edited by Jaime Martinez

Merge request reports