Skip to content

feat(billable-events): add billable events module to ai gateway

What does this merge request do and why?

https://gitlab.com/gitlab-org/architecture/usage-billing/design-doc/-/issues/4

https://gitlab.com/gitlab-org/architecture/usage-billing/design-doc/-/merge_requests/5

This MR explores how billing event should look like in AI Gateway.

Call site:
billable_client.track_billing_event(
    event_type="billable_event_type",  # → action + event_type
    category=__name__,                 # → where event happened
    metadata={                         # → billable context metadata
        "workflow_id": "wf_123456",
        "execution_environment": "ci_pipeline",
        "resource_consumption": {
            "cpu_seconds": 324.5,
            "memory_mb_seconds": 1024.7,
            "storage_operations": 55
        },
        "llm_operations": {
            "token_count": 4328,
            "model_id": "claude-3-sonnet-20240229",
            "prompt_tokens": 3150,
            "completion_tokens": 2178
        },
        "commands_executed": 17
    },
    unit_of_measure="tokens",
    quantity=2300,
)

Major Concerns

Here's the updated table without the "Raised By" column:

Major Concerns

Concern Category Description Status/Resolution
Security & Authentication No authentication for tracking endpoint - users could potentially spoof headers (instance_id, root_namespace_id) and send fake billing events Proposed solution: IP whitelisting for AI Gateway access to Snowplow collector. (We need to update ingress of DIP to allow request only from AI Gateway IP)
Event Duplication Snowplow can send duplicate events days later (example showed same event_id received 4 times across different dates) Solution: Create unique event_id in billing module + DIP validation
Analytics Events Mapping Need reliable 1:1 mapping between billing and analytics events for data warehouse reporting and attribution Proposed: Use dedicated event_id sent to both event types
SLOs & Error Budgets Need to define Service Level Objectives and error budgets for billing event reliability Proposed: Use Snowplow's 10,000 buffer capacity threshold to calculate error budget consumption as a starting point. We need to audit how many billing events are being generated per day to come to SLO count.
User ID Privacy vs Billing Internal Events don't pass user_id for privacy, but it's essential for billing - need mechanism to pass from GitLab instance to AI Gateway Requires implementation of secure user ID passing mechanism
Legal Compliance Collecting user data for billing purposes requires legal team verification and contract updates Confirmed as verified with legal team and need inputs from commercial(TODO)
Monitoring & Alerting Need robust monitoring for event failures with proper alerting mechanisms Use AI Gateway's existing monitoring system with Snowplow's on_failure callbacks
Retry Mechanism Reliability Ensure failed events are properly retried without data loss Snowplow provides in-memory event store with 10,000-25,000 event buffer capacity (!3025 (comment 2647827771))

This table format is cleaner and focuses on the key concerns and their resolutions for your MR description.

How to set up and validate locally

To setup locally:

  1. Setup snowplow micro.
  2. Set the following variables in .env:
AIGW_BILLING_EVENT__ENABLED=true
AIGW_BILLING_EVENT__ENDPOINT=http://127.0.0.1:9090
AIGW_BILLING_EVENT__APP_ID=gitlab_ai_gateway
AIGW_BILLING_EVENT__NAMESPACE=gl
AIGW_BILLING_EVENT__BATCH_SIZE=1
AIGW_BILLING_EVENT__THREAD_COUNT=1
  1. Apply below patch
diff --git a/ai_gateway/models/base.py b/ai_gateway/models/base.py
index 4b9b0737..884a4f78 100644
--- a/ai_gateway/models/base.py
+++ b/ai_gateway/models/base.py
@@ -13,6 +13,7 @@ from pydantic import BaseModel
 from ai_gateway.config import Config
 from ai_gateway.instrumentators.model_requests import ModelRequestInstrumentator
 from ai_gateway.structured_logging import can_log_request_data, get_request_logger
+from lib.billing_events import BillingEventsClient
 
 # TODO: The instrumentator needs the config here to know what limit needs to be
 # reported for a model. This would be nicer if we dependency inject the instrumentator
@@ -148,6 +149,10 @@ def grpc_connect_vertex(client_options: dict) -> PredictionServiceAsyncClient:
 
 
 async def log_request(request: httpx.Request):
+    from ai_gateway.async_dependency_resolver import get_billing_event_client
+
+    billing_event_client: BillingEventsClient = await get_billing_event_client()
+
     if can_log_request_data():
         request_log.info(
             "Request to LLM",
@@ -156,6 +161,19 @@ async def log_request(request: httpx.Request):
             request_url=request.url,
             request_content_json=json.loads(request.content.decode("utf8")),
         )
+
+        print("billling event print")
+        # Track billing event for LLM request
+        billing_event_client.track_billing_event(
+            event_type="llm_request",
+            unit_of_measure="tokens",
+            quantity=1.0,
+            metadata={
+                "request_method": str(request.method),
+                "request_url": str(request.url),
+            },
+            category=__name__,
+        )
     else:
         log.info(
             "Request to LLM",

Make below curl request

curl --location 'http://localhost:5052/v1/prompts/chat/documentation_search' \
--header 'accept: application/json' \
--header 'Content-Type: application/json' \
--header 'x-gitlab-enabled-feature-flags: expanded_ai_logging,duo_chat_docs_qa_claude_3_7' \
--header 'Cookie: snowplow-micro=0ce94263-c827-4299-ba7c-f05808235302' \
--data '{
  "inputs": {
    "question": "How can I create an issue?",
    "content_id": "ATTRS",
    "documents": [
      { "content": "Issue is ...", "id": 1 }
    ],
    "stream": true
  }
}'

Numbered steps to set up and validate the change are strongly suggested.

Merge request checklist

  • Tests added for new functionality. If not, please raise an issue to follow up.
  • Documentation added/updated, if needed.
Edited by Shinya Maeda

Merge request reports

Loading