Replace Unit Primitives / Features mapping in Duo analytics
Issue Status
Last update: 2024-12-11
ETA: before 2024-03-20 (should be completed before cloud connector's new component migration detailed in &14310 (closed))
-
Share proposal -
Proposal Feedback/Validation -
Implementation - WIP -
Analytics instrumentation - WIP -
add attributes definition to standard context https://gitlab.com/gitlab-org/iglu/-/merge_requests/131 -
populate attributes values from AI-related services #505197 (closed)
-
-
-
Data insights implementation
Context
- Read the epic description &15625 (closed) for more context on Unit Primitive re-definition
Background on AI usage analytics
The AI gateway reporting dashboard in Tableau details Duo features usage, based on events instrumented in the AI gateway:
Benefits
- Any AI feature request is tracked automatically using the Unit Primitive name (removing the manual steps of creating a new event definition and adding instrumentation code in rails)
- Customers cannot block usage data collection as it comes from the cloud environment and not the instance
Trade offs:
- It requires to follow the same event schema/naming for all features
- As Unit Primitives are used for entitlements/permission checks as well (see here), the naming should work for both use cases.
Problem
Main problem:
When a Unit Primitive changes — for instance, when it is decomposed (example: code_suggestion
is split into code_generation
and code_completion
) — it disrupts the ability to query feature usage over time, impacting historical reporting.
Other issues:
- Feature Confusion: "Feature" names are populated using Unit Primitives "Services" within the data processing pipeline (referenced here). The displayed list of “Features” mirrors the list of “Unit Primitives,” creating ambiguity. This confusion is partly due to workaround #2, which will be addressed by removing the “Services” object from Unit Primitives.
- Complex Mapping: The many-to-many mapping between Features and Unit Primitives adds further complexity to the data pipeline. As we remove the “Services” object, we have an opportunity to simplify by eliminating many-to-many mappings.
Related discussions: https://gitlab.com/groups/gitlab-org/-/epics/15212, #485012 (closed)
Unit primitives vs Features list:
"Cleaned" list of Primitives (without the _proxy workaround etc): https://docs.google.com/spreadsheets/d/1joRI5VSovG1r2EtU4B3xZZaHJheaQxQUw9vzGKMSmms/edit
Proposed solution
1/ Replace Features by new objects
Replace the existing Feature
list with new objects: Client type, name and version
, Interface
, Category
and Team
(Group), to better align with evolving product and usage tracking needs.
Why?
- The concept or scope of a marketable "feature" will change over time, whereas the "interface" and "category" should be more stable.
- We are conflating the term "feature" to represent multiple purposes which creates ambiguity.
- Some capabilities started within a single interface but their context of usage is expanding.
Example 1: Duo Chat
- Duo Chat is an "interface" to interact with AI features that can be used in different "clients" (IDE, Web IDE, Web UI)...
- "Ask epic question", "Ask Build question", "Ask default question" are some of the capabilities (unit primitives) of this interface.
- These capabilities could later be accessed through other interfaces, like the Duo Workflow, an API, or standalone Web UI elements.
Example 2: Test Generation
- Currently available to use via Duo chat only
- in an IDE or Web IDE
- in the Web UI (looking at a file)
- In the future, we can imagine Test Generation could be used outside of Duo Chat:
- in a Duo workflow execution
- as a button/UI element that may commit to a file directly
- via the the API (requested/response via to a 3rd party tool)
How?
- Event Attributes: Each object (Interface, Client *, Category...) could be added to events as standardized attributes, possibly aligned with the Unified Cloud Context framework, to ensure clear context in usage data (or inferred within the data processing pipeline)
- Filters and Dashboard Dimensions: These attributes could be used as both filters and dimensions in dashboards, allowing us to group and segment data more meaningfully based on the context in which a capability is used.
What?
Proposed list of new attributes:
Attributes | Example | Usage example | Notes |
---|---|---|---|
Interface (or source) |
|
|
|
Client type, name, version |
|
|
|
Category |
|
|
Related issue 2. Introduce a ‘feature category’ value to event_property in the standard context |
Group (Team) |
|
|
Less useful but the attribute is already in the Unit primitive definition Note: this could be inferred from the YML directly within the data pipeline and not added to the event -see comment |
Request |
|
|
we should be able to collect a unique ID that enables to group queries by user requests or LLM requests (see #502457 (comment 2203976942)) Note: this may be more complex and moved to a separate issue. |
2/ Rename some unit primitives for clarity
This is minor but especially the default/fallback UPs, it would help disambiguate them from their category.
- ex:
duo_chat
->ask_duo_chat_default
- for default questions (which are not one of the specific ask_*)
Questions
Where to get these new objects/attributes from?
Some are already in the new UP structure, which should be used as the new Cloud Connector source. The cloud Connector team will make sure it is updated during the development of the new CC Component (timeline).
"Client" and "Interface" would be generated from the front-end or API endpoint and passed it to the event emitting service in the back-end (AI gateway). No information should require to be stored.
How to migrate current version of the dashboard?
Currently the Features list is basically the same as UP. If we remove it, most of the "Features" will still be queryable as Unit Primitives. So I don't think this should be a problem for anybody if that filter doesn't exist anymore. Though, as we implement these new objects, we should then enforce a process for when one of these newly defined objects/attributes is decomposed/updated.