feat(billing): add observability metrics and tracing

Summary

Adds Prometheus counters and a tracing span for the Snowplow billing event pipeline (initially introduced in !937 (merged)). This will be used to set alerts for event drops.

All counter series are pre-registered at zero on startup so rate(...) == 0 alerts compare against zero rather than against an absent series. Drop and rejection log lines add correlation_id for cross-domain joining.

Related to https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/488 (rebased on !1162 (merged)).

Counters added

OTel name Prometheus name Labels
gkg.billing.events.emitted gkg_billing_events_emitted_total
gkg.billing.events.dropped gkg_billing_events_dropped_total reason
gkg.billing.events.rejected gkg_billing_events_rejected_total

reason values on events.dropped: realm_missing, realm_unrecognized, event_build_failed.

events.rejected fires when the Snowplow tracker refuses an event at enqueue (queue full or shutdown). HTTP delivery failures to the collector are not surfaced through this counter.

How to verify locally

GKG_ENABLE_METRICS=true \
GKG_BILLING__ENABLED=true \
GKG_BILLING__COLLECTOR_URL=http://localhost:9090 \
mise run dev

Step 1 — registration check (no query needed):

curl -s localhost:9100/metrics | grep ^gkg_billing

All five series should appear at zero, including each labelled drop reason:

gkg_billing_events_emitted_total{otel_scope_name="gkg"} 0
gkg_billing_events_dropped_total{reason="event_build_failed",otel_scope_name="gkg"} 0
gkg_billing_events_dropped_total{reason="realm_missing",otel_scope_name="gkg"} 0
gkg_billing_events_dropped_total{reason="realm_unrecognized",otel_scope_name="gkg"} 0
gkg_billing_events_rejected_total{otel_scope_name="gkg"} 0

Step 2 — drive a query and confirm increment + Snowplow event:

Issue any authenticated ExecuteQuery (the Orbit explore UI at http://localhost:3000/dashboard/orbit/explore, the Duo MCP integration, or grpcurl with a Rails-minted JWT). Then:

curl -s localhost:9100/metrics | grep ^gkg_billing_events_emitted    # → 1
Edited by Sharmad Nachnolkar

Merge request reports

Loading