Ensuring scalability of AI feature analytics
Problem
The AI ecosystem at GitLab is expanding rapidly with continuous introduction and enhancement of capabilities like code suggestions, Duo chat, root cause analysis, and more. While these innovations provide value to our customers, we currently lack a scalable approach to tracking their usage effectively.
The Optimize team has been solely responsible for implementing in-product tracking of feature usage across all AI initiatives. This presents significant challenges:
- We lack domain expertise for many of these features
- The pace of new feature deployment exceeds our capacity to implement tracking
- This responsibility diverts our resources from core priorities like VSA, DORA metrics, and demonstrating the ROI of AI's impact on the software development lifecycle.
This centralized approach to feature tracking is not sustainable and limits our ability to provide meaningful insights.
Proposal
We need to decentralize the responsibility for usage tracking by making it an integral part of the feature development process:
Responsibility Transition Plan
- Feature teams should take ownership of implementing in-product usage tracking for their specific AI capabilities
- Raw data collection should be a prerequisite requirement for any new feature to reach GA status
- For existing GA features without in-product tracking, teams should prioritize implementing data collection in the upcoming releases
- The Optimize team will gradually transfer ownership of existing tracking implementations to their respective feature teams, providing documentation and support during the transition
This approach will distribute the workload appropriately, ensure domain expertise in data collection, and allow the Optimize team to focus on higher-level analysis and our core responsibilities.
The table below outlines data currently being collected and which team should take over ownership.
Feature | Summary | Group |
---|---|---|
Duo seat assignment | Current data is stored in PG. CH stores all historic data too. |
groupfulfillment platform |
Code suggestions | Data is stored in PG for 3 months and CH indefinitely | groupeditor extensions |
Duo chat | Data is stored in PG for 3 months and CH indefinitely | groupduo chat |
Root cause analysis | We currently track the /troubleshoot command in the context of a failed pipeline. Data is stored in PG for 3 months and CH indefinitely | grouppipeline execution |
The table below outlines features which potentially need tracking
Feature | Group |
---|---|
Code Explanation | groupcode creation |
Test Generation | groupcode creation |
Refactor Code | groupcode creation |
Fix Code | groupcode creation |
Duo for CLI | groupcode creation |
Automated Merge Commits | groupcode creation |
Vulnerability Resolution | groupsecurity insights |
Vulnerability Explanation | groupsecurity insights |
Discussion Summary | groupproject management |
Issue Description Summary | groupproject management |
Duo Code Review | groupcode creation |
If you read the list above and notice missing features please add them. This list was pulled from https://gitlab.com/gitlab-com/packaging-and-pricing/pricing-handbook/-/issues/542#ask
Additional context around AI metrics
How to store AI event data in product
- Data which already exists in PG is synced to CH
- Data which doesn't exist in PG is stored in a redis buffer and then flushed to CH
The mermaid diagrams on the groupoptimize team page visually illustrates this: https://handbook.gitlab.com/handbook/engineering/development/analytics/monitor/optimize/#ssot-for-data-flows-across-optimize-features
Available APIs
When adding / modifying data collection for new / existing features the 3 following graphql endpoints need to be taken into consideration. groupoptimize can help identify which endpoints would be most applicable for the respective event data.
-
AiMetrics (requires ClickHouse)
- Aggregated group metrics. Used for the AI Impact Analytics dashboard
-
AIUserMetrics (requires ClickHouse)
- Aggregated per user metrics.
-
AIUsageData (does not require ClickHouse)
- Raw data access for customers without CH. Limited to 3 months retention and currently only supports code suggestion events.
Open questions
-
Do we want to store data which we don't yet have a use case for -
🚧 -
Which of the following requirements do we want to include in the responsibility checklist -
🚧 -
If CH is available, should we still store 3 months' data in PG? -
🚧 -
How is existing tooling going to be migrated to DIP and the Instrumentation Layer -
🚧