MVC for Data Analyst Agent (#19500) · Epics · GitLab.org

MVC for Data Analyst Agent

## Description Build the first iteration of **Data Analyst** Agent to enable customers to use Duo to self-service their analytical needs as described in https://gitlab.com/groups/gitlab-org/-/epics/19499+ while also taking into consideration the learnings from the [PoC](https://gitlab.com/gitlab-org/gitlab/-/issues/567743) https://gitlab.com/gitlab-org/gitlab/-/issues/567743#note_2781263604. _This MVC assumes you have prior knowledge of _[_GLQL_](https://docs.gitlab.com/user/glql/)_ and what data it _[_currently supports_](https://docs.gitlab.com/user/glql/#supported-areas)_._ ### MVC proposal :hatching_chick: **Goal**: Enable users to: - Ask natural language questions about their GitLab data - Receive accurate GLQL queries with visualized results - Ask questions about the results. For the MVC, we should be aiming to achieve \>80% accuracy for Duo generating GLQL queries. Future iterations can work to improve Duo's accuracy and reliability. ```mermaid graph TB USER[User] --> CHAT["Duo Chat (AI Catalog)"] CHAT --> AGENT["Analytics Agent (transform NatLang -> GLQL + analyze query results)"] AGENT --> VIZ_TOOL["GLQL Visualizer Done ✅"] AGENT --> ANALYZER_TOOL["Create 'run GLQL tool' on Duo Agent Platform 🚧"] VIZ_TOOL --> RENDER["Render chart / table from GLQL query"] ANALYZER_TOOL --> API["New GLQL API (query GLQL directly on the backend)"] API --> GQL["Execute as GraphQL (transpile GLQL to GraphQL on the backend)"] GQL --> DATA[("GitLab Data")] style USER fill:#6366f1,color:#fff style CHAT fill:#10b981,color:#fff style AGENT fill:#f59e0b,color:#fff style VIZ_TOOL fill:#8b5cf6,color:#fff style ANALYZER_TOOL fill:#8b5cf6,color:#fff ``` #### First iteration - AI Catalog (:white_check_mark: - https://gitlab.com/gitlab-org/gitlab/-/issues/573056+s) Feature: Duo Analytics transforms a user's natural language questions to GLQL and visualizes the result. This is the simplest yet useful first version of Duo Analytics we can deliver. It would be made available on our [AI catalog](https://gitlab.com/explore/ai-catalog/agents) where users can add it to their projects, after which Duo Analytics will become available as an option in Duo Chat. Note that it wouldn't be able to under the query yet, as that requires GLQL in the backend (see follow-up iterations). ```mermaid flowchart LR A[User asks question] --> B[Duo generates GLQL query] B --> C[Visualize the result] C --> D[User can copy+paste GLQL in GitLab] style A fill:#6366f1,color:#fff style B fill:#f59e0b,color:#fff style C fill:#f59e0b,color:#fff style D fill:#8b5cf6,color:#fff ``` For this, we'll need to: - Add a new Analytics agent to the [AI catalog](https://gitlab.com/explore/ai-catalog/agents) that can generate GLQL queries. - Fix the current markdown render tool so that it can render GLQL in Duo chat. Estimated weight: Low-Medium <details> <summary>Click to view demo</summary> Demo video by @rob.hunt showing Duo Analytics POC answering a question using GLQL but unable to visualize the result. ![Screen_Recording_2025-09-26_at_16.16.52](/uploads/af94284e89464a1bda1de92f20146ca5/Screen_Recording_2025-09-26_at_16.16.52.mov) </details> #### Second iteration - Foundational agent with analysis (https://gitlab.com/groups/gitlab-org/-/epics/19546+s) Feature: Duo Analytics returns summarized data, and is able to analysis the result for follow-up interaction with the user. With the release of Foundational Agents in %18.6, we have the opportunity to pivot from an AI Catalog driven agent, to a foundational one. Using a foundational agent solves several problems: - Git-driven agent changes, rather than UI-driven. Improving collaboration and accountability. - Enabled for all deployment types (unless disabled by the user's organization). - Discoverability, more prominent than AI Catalog agents. In conjunction with the pivot to a foundational agent, it also gives us an opportunity to add GLQL support to the agent by using a new tool. Which unlocks the ability for the agent to: - Run its own GLQL queries - Analyze the results, summarizing for the user - Answer follow-up questions about the data, helping the user understand what this means for their business context. ```mermaid flowchart LR A[User asks question] --> B[Duo fetches data] B --> C[Duo visualizes data] C --> D[Duo understands query + result] D --> E[Duo explains result] E --> A style A fill:#6366f1,color:#fff style E fill:#10b981,color:#fff style B fill:#f59e0b,color:#fff style C fill:#f59e0b,color:#fff style D fill:#8b5cf6,color:#fff ``` For this, we'll need to: - Move the AI Catalog agent from the AI Catalog, and add it to foundational agents in Duo Workflow Service - Create a new Ruby gem for processing GLQL queries - Create a new REST API to take GLQL queries and run the resulting GraphQL queries - Create a new tool to query the REST API and return the results/failures - Update the agent to understand the response from the tool for summarization and follow-up Estimated weight: Medium ### Follow-up iterations - Export GLQL to a work item and/or MR using existing tools. - :ballot_box_with_check: Should be easy to accomplish. ### Future iterations Iterations that depend on upcoming or suggested changes: - ~~Mark the agent as Made by gitlab (not supported yet)~~ * Done in the first iteration as already implemented https://gitlab.com/gitlab-org/gitlab/-/issues/578342 - ~~Automatically keep Duo Analytics up to date with GLQL without needing to use the UI (suggested change)~~ * Being done as part of the [foundational agent](https://gitlab.com/gitlab-org/gitlab/-/issues/578342) - Export to data explorer (https://gitlab.com/groups/gitlab-org/-/epics/19359+) - Export to custom dashboard (https://gitlab.com/groups/gitlab-org/-/epics/19430+) - Migrate the newly created tools from DAP (Python) to MCP Server (Rails), once MCP is not integrated with AI Catalog (https://gitlab.com/groups/gitlab-org/-/epics/18759+) - ~~Switch from creating the agent in the UI to doing it with YAML config ( not integrated with Duo chat at the moment)~~ * Done in the first iteration as already implemented https://gitlab.com/gitlab-org/gitlab/-/issues/578342 - Switch from single agent to multi-agents flow (if needed, for instance when integrating with dashboards) - Improve handling of unsupported queries or errors resulting from generated queries - Improve handling of agent handling of ambiguous follow-up questions ### Considerations The [AI catalog](https://gitlab.com/explore/ai-catalog/agents) is a brand new feature that's under heavy development. It's current limitations are: * Rapidly changing AI landscape, need to rely on two-way door decisions where we can. * UI discoverability of our Analytics agent in Duo chat. * See the mockup flow images in this comment https://gitlab.com/gitlab-org/gitlab/-/issues/567743#note_2774083760. * Helped by shifting to [foundational agent](https://gitlab.com/gitlab-org/gitlab/-/issues/578342). * With the release of foundational agents in %18.6, we can release this foundational agent to all deployment types. ### Metrics of success Agentic chat, and custom agents, don't yet have metrics we can plug into to be able to quantitively measure our success. We've begun conversations with: - ~"group::workflow catalog" - https://gitlab.com/gitlab-org/gitlab/-/issues/573757+ - ~"group::duo chat" - https://gitlab.com/gitlab-org/gitlab/-/issues/550897#note_2795866338 To look at how we might integrate detailed metrics, allowing teams that are developing "Created by GitLab" agents to be able to monitor their own work. In the meantime, we will monitor success using a [feedback issue](https://gitlab.com/gitlab-org/gitlab/-/issues/574028), and being _Customer 0_, using the feature for the development of the wider [dashboard foundations](https://gitlab.com/groups/gitlab-org/-/epics/18072). We can also leverage Grafana dashboards, running something like the below to calculate success/failure: ``` sum (rate(agent_platform_tool_failure_total{tool_name="create_commit"}[1h])) / sum (rate(duo_workflow_tool_call_seconds_count{tool_name="create_commit"}[1h])) * 100 ```

epic