feat(billable-events): add billable events module to ai gateway
What does this merge request do and why?
https://gitlab.com/gitlab-org/architecture/usage-billing/design-doc/-/issues/4
https://gitlab.com/gitlab-org/architecture/usage-billing/design-doc/-/merge_requests/5
This MR explores how billing event should look like in AI Gateway.
Call site:
billable_client.track_billing_event(
event_type="billable_event_type", # → action + event_type
category=__name__, # → where event happened
metadata={ # → billable context metadata
"workflow_id": "wf_123456",
"execution_environment": "ci_pipeline",
"resource_consumption": {
"cpu_seconds": 324.5,
"memory_mb_seconds": 1024.7,
"storage_operations": 55
},
"llm_operations": {
"token_count": 4328,
"model_id": "claude-3-sonnet-20240229",
"prompt_tokens": 3150,
"completion_tokens": 2178
},
"commands_executed": 17
},
unit_of_measure="tokens",
quantity=2300,
)
Major Concerns
Here's the updated table without the "Raised By" column:
Major Concerns
| Concern Category | Description | Status/Resolution |
|---|---|---|
| Security & Authentication | No authentication for tracking endpoint - users could potentially spoof headers (instance_id, root_namespace_id) and send fake billing events |
Proposed solution: IP whitelisting for AI Gateway access to Snowplow collector. (We need to update ingress of DIP to allow request only from AI Gateway IP) |
| Event Duplication | Snowplow can send duplicate events days later (example showed same event_id received 4 times across different dates) |
Solution: Create unique event_id in billing module + DIP validation |
| Analytics Events Mapping | Need reliable 1:1 mapping between billing and analytics events for data warehouse reporting and attribution | Proposed: Use dedicated event_id sent to both event types |
| SLOs & Error Budgets | Need to define Service Level Objectives and error budgets for billing event reliability | Proposed: Use Snowplow's 10,000 buffer capacity threshold to calculate error budget consumption as a starting point. We need to audit how many billing events are being generated per day to come to SLO count. |
| User ID Privacy vs Billing | Internal Events don't pass user_id for privacy, but it's essential for billing - need mechanism to pass from GitLab instance to AI Gateway |
Requires implementation of secure user ID passing mechanism |
| Legal Compliance | Collecting user data for billing purposes requires legal team verification and contract updates | Confirmed as verified with legal team and need inputs from commercial(TODO) |
| Monitoring & Alerting | Need robust monitoring for event failures with proper alerting mechanisms | Use AI Gateway's existing monitoring system with Snowplow's on_failure callbacks |
| Retry Mechanism Reliability | Ensure failed events are properly retried without data loss | Snowplow provides in-memory event store with 10,000-25,000 event buffer capacity (!3025 (comment 2647827771)) |
This table format is cleaner and focuses on the key concerns and their resolutions for your MR description.
How to set up and validate locally
To setup locally:
- Setup snowplow micro.
- Set the following variables in
.env:
AIGW_BILLING_EVENT__ENABLED=true
AIGW_BILLING_EVENT__ENDPOINT=http://127.0.0.1:9090
AIGW_BILLING_EVENT__APP_ID=gitlab_ai_gateway
AIGW_BILLING_EVENT__NAMESPACE=gl
AIGW_BILLING_EVENT__BATCH_SIZE=1
AIGW_BILLING_EVENT__THREAD_COUNT=1
- Apply below patch
diff --git a/ai_gateway/models/base.py b/ai_gateway/models/base.py
index 4b9b0737..884a4f78 100644
--- a/ai_gateway/models/base.py
+++ b/ai_gateway/models/base.py
@@ -13,6 +13,7 @@ from pydantic import BaseModel
from ai_gateway.config import Config
from ai_gateway.instrumentators.model_requests import ModelRequestInstrumentator
from ai_gateway.structured_logging import can_log_request_data, get_request_logger
+from lib.billing_events import BillingEventsClient
# TODO: The instrumentator needs the config here to know what limit needs to be
# reported for a model. This would be nicer if we dependency inject the instrumentator
@@ -148,6 +149,10 @@ def grpc_connect_vertex(client_options: dict) -> PredictionServiceAsyncClient:
async def log_request(request: httpx.Request):
+ from ai_gateway.async_dependency_resolver import get_billing_event_client
+
+ billing_event_client: BillingEventsClient = await get_billing_event_client()
+
if can_log_request_data():
request_log.info(
"Request to LLM",
@@ -156,6 +161,19 @@ async def log_request(request: httpx.Request):
request_url=request.url,
request_content_json=json.loads(request.content.decode("utf8")),
)
+
+ print("billling event print")
+ # Track billing event for LLM request
+ billing_event_client.track_billing_event(
+ event_type="llm_request",
+ unit_of_measure="tokens",
+ quantity=1.0,
+ metadata={
+ "request_method": str(request.method),
+ "request_url": str(request.url),
+ },
+ category=__name__,
+ )
else:
log.info(
"Request to LLM",
Make below curl request
curl --location 'http://localhost:5052/v1/prompts/chat/documentation_search' \
--header 'accept: application/json' \
--header 'Content-Type: application/json' \
--header 'x-gitlab-enabled-feature-flags: expanded_ai_logging,duo_chat_docs_qa_claude_3_7' \
--header 'Cookie: snowplow-micro=0ce94263-c827-4299-ba7c-f05808235302' \
--data '{
"inputs": {
"question": "How can I create an issue?",
"content_id": "ATTRS",
"documents": [
{ "content": "Issue is ...", "id": 1 }
],
"stream": true
}
}'
Numbered steps to set up and validate the change are strongly suggested.
Merge request checklist
-
Tests added for new functionality. If not, please raise an issue to follow up. -
Documentation added/updated, if needed.
Edited by Shinya Maeda