feat(billing): add observability metrics and tracing
Summary
Adds Prometheus counters and a tracing span for the Snowplow billing event pipeline (initially introduced in !937 (merged)). This will be used to set alerts for event drops.
All counter series are pre-registered at zero on startup so rate(...) == 0 alerts compare against zero rather than against an absent series. Drop and rejection log lines add correlation_id for cross-domain joining.
Related to https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/488 (rebased on !1162 (merged)).
Counters added
| OTel name | Prometheus name | Labels |
|---|---|---|
gkg.billing.events.emitted |
gkg_billing_events_emitted_total |
— |
gkg.billing.events.dropped |
gkg_billing_events_dropped_total |
reason |
gkg.billing.events.rejected |
gkg_billing_events_rejected_total |
— |
reason values on events.dropped: realm_missing, realm_unrecognized, event_build_failed.
events.rejected fires when the Snowplow tracker refuses an event at enqueue (queue full or shutdown). HTTP delivery failures to the collector are not surfaced through this counter.
How to verify locally
GKG_ENABLE_METRICS=true \
GKG_BILLING__ENABLED=true \
GKG_BILLING__COLLECTOR_URL=http://localhost:9090 \
mise run devStep 1 — registration check (no query needed):
curl -s localhost:9100/metrics | grep ^gkg_billingAll five series should appear at zero, including each labelled drop reason:
gkg_billing_events_emitted_total{otel_scope_name="gkg"} 0
gkg_billing_events_dropped_total{reason="event_build_failed",otel_scope_name="gkg"} 0
gkg_billing_events_dropped_total{reason="realm_missing",otel_scope_name="gkg"} 0
gkg_billing_events_dropped_total{reason="realm_unrecognized",otel_scope_name="gkg"} 0
gkg_billing_events_rejected_total{otel_scope_name="gkg"} 0Step 2 — drive a query and confirm increment + Snowplow event:
Issue any authenticated ExecuteQuery (the Orbit explore UI at http://localhost:3000/dashboard/orbit/explore, the Duo MCP integration, or grpcurl with a Rails-minted JWT). Then:
curl -s localhost:9100/metrics | grep ^gkg_billing_events_emitted # → 1