Add customer facing Zuora errors to Grafana
Problem
We were able to track customer facing Zuora errors in the backend by creating a log metric in GCP. Given that GCP is not our SSOT for avaialibility, we should move it to a SSOT for our availability.
Proposal
Create a metric and graph in the groupfulfillment platform Grafana dashboard that shows counts and type of customer facing Zuora errors over time.
One way to instrument this metric is described in this comment.
To achieve this, we need to identify relevant places in the CustomersDot codebase where we want to capture Zuora error. For example the following block that rescues the failed creation of a Zuora order:
# app/jobs/zuora/orders/create.rb:9-12
log_error(
'Failed to create order',
extra: { error_message: e.message, params: params }
)
could be rewritten like:
log_and_capture_error(
'Failed to create order',
prometheus_label: :failed_to_create_order,
extra: { error_message: e.message, params: params }
)
A list of Prometheus labels, identifying a specific Zuora-related failure would then be created and used within the same reporting Grafana visualization.
Result
This will allow us to have a SSOT for availability of CDot as part of our PIs.