feat(billing): emit Snowplow billing events on successful ExecuteQuery

What does this MR do and why?

This MR adds billing event emission for every successful ExecuteQuery gRPC call, using labkit-events to emit Snowplow structured events conforming to the billable_usage/jsonschema/1-0-2 Iglu schema.

Why: Orbit (Knowledge Graph) queries are a billable feature. This MR wires up the consumption-based billing pipeline so that every successful graph query emits an orbit_workflow_completion event to a Snowplow collector, which feeds into CustomersDot for usage-based billing.

Key changes

New crates/gkg-server/src/billing/ module (compliance-auditable, isolated for CODEOWNERS):

  • BillingTracker trait — abstraction over the Snowplow tracker, enabling in-memory testing without a real collector.
  • SnowplowBillingTracker — production implementation wrapping labkit_events::Tracker. Built with batch_size(1) on the Tracker builder, so the Emitter's background task drains and HTTP-POSTs each event immediately without requiring a manual flush() call.
  • BillingObserver — implements PipelineObserver. Emits a billing event in finish() only for successful queries. Uses an internal errored: Cell<bool> guard set by record_error(), so even if the pipeline ever calls finish() after an error, no billing event is emitted.
  • constants module — fixed billing identifiers (CATEGORY = "orbit", EVENT_TYPE = "orbit_workflow_completion", UNIT_OF_MEASURE = "request", APP_ID = "gkg-server") plus normalize_realm(&str) which maps "saas" | "SaaS""SaaS" and "SM" | "self-managed""SM". Unknown or missing realm values are dropped with a tracing::warn! including the raw value for diagnosability.

New MultiObserver in query-engine/pipeline:

  • Generic PipelineObserver that wraps a Vec<Box<dyn PipelineObserver>> and forwards every callback to each inner observer.
  • Replaces an earlier two-slot CompositeObserver<A, B> (introduced and removed during review) with an extensible alternative — adding a new observer is now vec![..., Box::new(NewObserver::new(...))] with no changes elsewhere.
  • Unit-tested for empty-vec no-op, 2-observer forwarding, and 3+ observer composition.

JWT Claims extended with 8 optional fields to be populated by Rails, via gitlab!232123: instance_id, unique_instance_id, instance_version, global_user_id, host_name, root_namespace_id, deployment_type, realm. All use #[serde(default)] so existing JWTs without these fields continue to validate. These map directly to optional fields in the billable_usage schema, and realm is taken straight from the claim (not derived from deployment_type) at Rails's request.

BillingConfig added to gkg-server-config and config/default.yaml:

  • Only enabled: bool and collector_url: String are configurable.
  • Fixed identifiers (category, event_type, unit_of_measure, app_id) deliberately live in billing/constants.rs, not config — they are not environment-specific.
  • realm lives on JWT claims (per-request), not config (per-instance).
  • #[serde(default)] applied on AppConfig.billing so existing deployments without a billing: section still parse.

Pipeline wiring:

  • QueryPipelineService.with_billing(Arc<dyn BillingTracker>) follows the existing builder-chain pattern from with_resolver_registry / with_cache_broker.
  • run_query constructs MultiObserver::new(vec![Box::new(OTelPipelineObserver::start()), Box::new(BillingObserver::new(...))]) — a drop-in replacement for the previous single OTelPipelineObserver, with zero changes to pipeline stages.
  • GrpcServer and KnowledgeGraphServiceImpl thread the tracker down through the same builder pattern.
  • main.rs initializes the Tracker at webserver startup when billing.enabled = true. Fails fast with a clear error at startup if billing.enabled = true but billing.collector_url is empty.

labkit-events dependency bumped from 594f02c to 4082a42a to include MR !43 (billing event tracking API).

Billing event wire format

Each successful ExecuteQuery emits a Snowplow structured event (e=se) carrying a single billable_usage/1-0-2 context. The event has two parts:

1. Top-level Snowplow structured-event fields (on the outer event):

Field Value
e se (structured event)
se_ca (se_category) orbit
se_ac (se_action) orbit_workflow_completion
se_la (se_label) the billing event's UUID event_id
se_va (se_value) 1
aid (app_id) gkg-server
tv (tracker_version) rust-<labkit-events-version>
eid Snowplow event UUID (auto-generated by labkit-events)
dtm epoch ms timestamp (auto-generated by labkit-events)

2. Fields inside the iglu:com.gitlab/billable_usage/jsonschema/1-0-2 context (nested under co.data[0].data):

Field Value Source
event_id UUID generated by labkit-events
timestamp RFC3339 generated by labkit-events
event_type orbit_workflow_completion constants::EVENT_TYPE
realm SaaS or SM JWT claim, normalized via normalize_realm
unit_of_measure request constants::UNIT_OF_MEASURE
quantity 1.0 (1 request = 1 billable unit) hard-coded
organization_id integer JWT claim
subject <user_id> as a string, no prefix JWT claim
correlation_id string labkit tracing context
instance_id string JWT claim
unique_instance_id string JWT claim
instance_version string JWT claim
global_user_id string JWT claim
host_name string JWT claim
deployment_type ".com" / "dedicated" / "self-managed" JWT claim
metadata {"query_type": "<search/traversal/aggregation/...>", "feature_qualified_name": "<orbit-rest/orbit-mcp/orbit-frontend>"}

Note: root_namespace_id needs to be passed for billing. This is still being discussed in https://gitlab.com/gitlab-org/orbit/knowledge-graph/-/work_items/471+ and will be added after that.

Closely tied to gitlab!232123 ("Add instance and deployment claims to Knowledge Graph JWT") — that MR populates the 8 new claim fields on the Rails side. The GKG side accepts the fields as #[serde(default)], so this MR can merge independently; billing events will still emit, but without those optional attributes until the Rails MR is deployed.

End-to-end local test

This exercises the full flow — GDK/Rails mints a JWT → gRPC reaches GKG → pipeline emits a billing event → Snowplow Micro receives it.

Prerequisites

  1. GDK running locally with GitLab Rails.

  2. Pull Rails MR gitlab!232123 into your local GDK so Rails includes the new JWT claims:

    cd $GDK_ROOT/gitlab
    git fetch origin merge-requests/232123/head
    git checkout FETCH_HEAD
    gdk restart rails-web
  3. Start Snowplow Micro (mock Snowplow collector) in Docker:

    docker run -p 9090:9090 snowplow/snowplow-micro:latest
    # sanity-check: curl http://localhost:9090/micro/all → {"total":0,"good":0,"bad":0}
  4. Enable billing in GKG — set in config/default.yaml (or equivalent GKG_BILLING__* env vars):

    billing:
      enabled: true
      collector_url: "http://localhost:9090"
  5. Run GKG locally, connected to your GDK

Verify

  1. Reset Snowplow Micro before each run:

    curl -X POST http://localhost:9090/micro/reset
  2. Trigger a successful query — either via the GDK Orbit UI, or directly via grpcurl with a Rails-minted JWT. Example:

To get GDK JWT

```bash
 gdk rails console
 user = User.find_by_username('root')                # or any user you want to test as
 token = Analytics::KnowledgeGraph::JwtAuth.generate_token(user: user, source_type: 'rest')
 puts token

```bash
TOKEN=<rails-minted-jwt>

grpcurl -plaintext \
  -H "authorization: Bearer $TOKEN" \
  -import-path crates/gkg-server/proto -proto gkg.proto \
  -d '{"request": {"query": "{\"query_type\":\"search\",\"node\":{\"id\":\"u\",\"entity\":\"User\",\"filters\":{\"username\":\"nonexistent\"}},\"limit\":5}", "format": "RESPONSE_FORMAT_RAW"}}' \
  127.0.0.1:50054 gkg.v1.KnowledgeGraphService/ExecuteQuery
  1. Confirm the billing event landed in Snowplow Micro's good bucket:

    curl -s http://localhost:9090/micro/good | jq '.[0].event | {se_category, se_action, se_value, app_id}'
    # Expected: {"se_category":"orbit","se_action":"orbit_workflow_completion","se_value":"1.0","app_id":"gkg-server"}
  2. Confirm bad is empty (no schema-validation failures):

    curl -s http://localhost:9090/micro/bad
    # Expected: []
  3. Inspect the full payload to verify the billable_usage/1-0-2 context includes realm, organization_id, subject, instance_id, host_name, root_namespace_id, deployment_type, and the metadata.query_type / metadata.source_type fields populated from the JWT claims:

    curl -s http://localhost:9090/micro/good | jq '.[0].rawEvent.parameters.co'

What is NOT in this MR (follow-ups)

Related to https://gitlab.com/gitlab-org/gitlab/-/work_items/593192

Edited by Sharmad Nachnolkar

Merge request reports

Loading