Ensuring scalability of AI feature analytics

Problem

The AI ecosystem at GitLab is expanding rapidly with continuous introduction and enhancement of capabilities like code suggestions, Duo chat, root cause analysis, and more. While these innovations provide value to our customers, we currently lack a scalable approach to tracking their usage effectively.

The Optimize team has been solely responsible for implementing in-product tracking of feature usage across all AI initiatives. This presents significant challenges:

We lack domain expertise for many of these features
The pace of new feature deployment exceeds our capacity to implement tracking
This responsibility diverts our resources from core priorities like VSA, DORA metrics, and demonstrating the ROI of AI's impact on the software development lifecycle.

This centralized approach to feature tracking is not sustainable and limits our ability to provide meaningful insights.

Proposal

We need to decentralize the responsibility for usage tracking by making it an integral part of the feature development process:

Responsibility Transition Plan

Feature teams should take ownership of implementing in-product usage tracking for their specific AI capabilities
Raw data collection should be a prerequisite requirement for any new feature to reach GA status
For existing GA features without in-product tracking, teams should prioritize implementing data collection in the upcoming releases
The Optimize team will gradually transfer ownership of existing tracking implementations to their respective feature teams, providing documentation and support during the transition

This approach will distribute the workload appropriately, ensure domain expertise in data collection, and allow the Optimize team to focus on higher-level analysis and our core responsibilities.

The table below outlines data currently being collected and which team should take over ownership.

Feature	Summary	Group
Duo seat assignment	Current data is stored in PG. CH stores all historic data too.	groupfulfillment platform
Code suggestions	Data is stored in PG for 3 months and CH indefinitely	groupeditor extensions
Duo chat	Data is stored in PG for 3 months and CH indefinitely	groupduo chat
Root cause analysis	We currently track the /troubleshoot command in the context of a failed pipeline. Data is stored in PG for 3 months and CH indefinitely	grouppipeline execution

The table below outlines features which potentially need tracking

Feature	Group
Code Explanation	groupcode creation
Test Generation	groupcode creation
Refactor Code	groupcode creation
Fix Code	groupcode creation
Duo for CLI	groupcode creation
Automated Merge Commits	groupcode creation
Vulnerability Resolution	groupsecurity insights
Vulnerability Explanation	groupsecurity insights
Discussion Summary	groupproject management
Issue Description Summary	groupproject management
Duo Code Review	groupcode creation

If you read the list above and notice missing features please add them. This list was pulled from https://gitlab.com/gitlab-com/packaging-and-pricing/pricing-handbook/-/issues/542#ask

Additional context around AI metrics

How to store AI event data in product

Data which already exists in PG is synced to CH
Data which doesn't exist in PG is stored in a redis buffer and then flushed to CH

The mermaid diagrams on the groupoptimize team page visually illustrates this: https://handbook.gitlab.com/handbook/engineering/development/analytics/monitor/optimize/#ssot-for-data-flows-across-optimize-features

Available APIs

When adding / modifying data collection for new / existing features the 3 following graphql endpoints need to be taken into consideration. groupoptimize can help identify which endpoints would be most applicable for the respective event data.

AiMetrics (requires ClickHouse)
1. Aggregated group metrics. Used for the AI Impact Analytics dashboard
AIUserMetrics (requires ClickHouse)
1. Aggregated per user metrics.
AIUsageData (does not require ClickHouse)
1. Raw data access for customers without CH. Limited to 3 months retention and currently only supports code suggestion events.

Open questions

🚧 Discussion still open / 🟢 Discussion resolved

Edited Jun 24, 2025 by Lindsy Farina