Skip to content

Replace Unit Primitives / Features mapping in Duo analytics

Issue Status

Last update: 2024-12-11

ETA: before 2024-03-20 (should be completed before cloud connector's new component migration detailed in &14310 (closed))

Context

  • Read the epic description &15625 (closed) for more context on Unit Primitive re-definition

Background on AI usage analytics

The AI gateway reporting dashboard in Tableau details Duo features usage, based on events instrumented in the AI gateway:

image

Benefits

  • Any AI feature request is tracked automatically using the Unit Primitive name (removing the manual steps of creating a new event definition and adding instrumentation code in rails)
  • Customers cannot block usage data collection as it comes from the cloud environment and not the instance

Trade offs:

  • It requires to follow the same event schema/naming for all features
  • As Unit Primitives are used for entitlements/permission checks as well (see here), the naming should work for both use cases.

Problem

Main problem:

When a Unit Primitive changes — for instance, when it is decomposed (example: code_suggestion is split into code_generation and code_completion) — it disrupts the ability to query feature usage over time, impacting historical reporting.

Other issues:

  • Feature Confusion: "Feature" names are populated using Unit Primitives "Services" within the data processing pipeline (referenced here). The displayed list of “Features” mirrors the list of “Unit Primitives,” creating ambiguity. This confusion is partly due to workaround #2, which will be addressed by removing the “Services” object from Unit Primitives.
  • Complex Mapping: The many-to-many mapping between Features and Unit Primitives adds further complexity to the data pipeline. As we remove the “Services” object, we have an opportunity to simplify by eliminating many-to-many mappings.

Related discussions: https://gitlab.com/groups/gitlab-org/-/epics/15212, #485012 (closed)

Unit primitives vs Features list:

image.png

"Cleaned" list of Primitives (without the _proxy workaround etc): https://docs.google.com/spreadsheets/d/1joRI5VSovG1r2EtU4B3xZZaHJheaQxQUw9vzGKMSmms/edit

Proposed solution

1/ Replace Features by new objects

Replace the existing Feature list with new objects: Client type, name and version, Interface, Category and Team (Group), to better align with evolving product and usage tracking needs.

Why?

  • The concept or scope of a marketable "feature" will change over time, whereas the "interface" and "category" should be more stable.
  • We are conflating the term "feature" to represent multiple purposes which creates ambiguity.
  • Some capabilities started within a single interface but their context of usage is expanding.

Example 1: Duo Chat

  • Duo Chat is an "interface" to interact with AI features that can be used in different "clients" (IDE, Web IDE, Web UI)...
  • "Ask epic question", "Ask Build question", "Ask default question" are some of the capabilities (unit primitives) of this interface.
  • These capabilities could later be accessed through other interfaces, like the Duo Workflow, an API, or standalone Web UI elements.

Example 2: Test Generation

  • Currently available to use via Duo chat only
    • in an IDE or Web IDE
    • in the Web UI (looking at a file)
  • In the future, we can imagine Test Generation could be used outside of Duo Chat:
    • in a Duo workflow execution
    • as a button/UI element that may commit to a file directly
    • via the the API (requested/response via to a 3rd party tool)

How?

  • Event Attributes: Each object (Interface, Client *, Category...) could be added to events as standardized attributes, possibly aligned with the Unified Cloud Context framework, to ensure clear context in usage data (or inferred within the data processing pipeline)
  • Filters and Dashboard Dimensions: These attributes could be used as both filters and dimensions in dashboards, allowing us to group and segment data more meaningfully based on the context in which a capability is used.

What?

Proposed list of new attributes:

Attributes Example Usage example Notes

Interface

(or source)

  • interface
    • duo-chat
    • duo-workflow
    • gitlab-web-ui (from a page in gitlab web app)
    • other (ie. api / 3rd party applications)
  • How many users are conversing with Duo chat daily?
  • What's the number of questions (unique requests) asked to Duo Chat daily?
  • How many VSCode vs JetBrains active users?
  • How many summarized_issue's are executed via Duo chat or the /summarize_issue feature in the Web UI or via Duo Workflow?
  • Only one interface value per event
  • many-to-many relationship with primitive (An interface can be linked to any UP)
  • Interface names would be pre-defined in standardized context.
  • similar issue: #481539 (closed)

Client type, name, version

  • client_type (inferred from client name)
    • web_browser
    • ide
    • other
  • client_name (inferred from user agent)
    • chrome
    • firefox
    • vscode
    • jetbrains
  • client_version (inferred from user agent)
    • 1.4.5
  • How many IDE vs Web UI active users?
  • Breakdown of Duo chat users: IDE vs Web
  • Only one client type, name or version value per event
  • many-to-many relationship with primitive
  • Client would be based on the user-agent, ie. the initial program that starts the request - the browser, the IDE...
  • Would require to get info like user-agent from the front-end (rails) and pass it to the event emitting service in the back-end (AI gateway). Seems that it is already done in AI Gateway (and in IDE LSP).
  • If so it may work only for GitLab.com or S-M instances above a certain version
  • Similar issue: https://gitlab.com/gitlab-data/product-analytics/-/issues/1808

Category

  • category (or feature_category )
    • project_management (features related to plan)
    • source_code (features related to code creation)
    • duo_chat (features uniquely related to duo chat)
    • ... see list in column D
  • How many user using Duo features for planning vs writing code?
  • Which category generates the most traffic?
  • Only one category per unit primitive "event"
  • 1-to-many relationship with primitives (a unit primitive can be only part of ONE category)
  • Should be a stable name that can be used over time by a PM of a certain category to assess usage of their related AI feature (and can expand to non-AI later)
  • Reuse the existing "Feature Category" attribute from the new UP configuration structure
  • Note: the "Duo Chat" category should contain the feature directly accessible such as ask_*
  • Contextual awareness feature should be their own category as they may be integrated into other interface.

Related issue 2. Introduce a ‘feature category’ value to event_property in the standard context

Group

(Team)

  • Which team generates the most Duo requests/traffic?

Less useful but the attribute is already in the Unit primitive definition

Note: this could be inferred from the YML directly within the data pipeline and not added to the event -see comment

Request

  • user_request_id
  • llm_request_id
  • How many questions are asked on Duo Chat daily?
  • How many LLM requests are sent in average per Duo Chat question?

we should be able to collect a unique ID that enables to group queries by user requests or LLM requests (see #502457 (comment 2203976942))

Note: this may be more complex and moved to a separate issue.

2/ Rename some unit primitives for clarity

This is minor but especially the default/fallback UPs, it would help disambiguate them from their category.

  • ex: duo_chat -> ask_duo_chat_default - for default questions (which are not one of the specific ask_*)

Questions

Where to get these new objects/attributes from?

Some are already in the new UP structure, which should be used as the new Cloud Connector source. The cloud Connector team will make sure it is updated during the development of the new CC Component (timeline).

"Client" and "Interface" would be generated from the front-end or API endpoint and passed it to the event emitting service in the back-end (AI gateway). No information should require to be stored.

How to migrate current version of the dashboard?

Currently the Features list is basically the same as UP. If we remove it, most of the "Features" will still be queryable as Unit Primitives. So I don't think this should be a problem for anybody if that filter doesn't exist anymore. Though, as we implement these new objects, we should then enforce a process for when one of these newly defined objects/attributes is decomposed/updated.

Edited by Sacha Guyon