CodeSuggestion: end-to-end GLQL aggregation integration (#21212) · Epics · GitLab.org

CodeSuggestion: end-to-end GLQL aggregation integration

Enable CodeSuggestion analytics queries using GLQL with `mode: analytics` syntax, building on the existing CodeSuggestions aggregation engine. The backend aggregation engine and ClickHouse table already exist (completed in !224204). This epic focuses on **GLQL integration** to make CodeSuggestion analytics accessible through the GLQL query language and Data Analyst agent. **Parent Epic:** &21207 Data Aggregation in GLQL ## Example GLQL Query ```yaml mode: analytics query: type = codesuggestion and timestamp >= -30d display: table displayConfig: showSparklines: true showTrends: true dimensions: language as 'Language', timeSegment(1d) on timestamp metrics: totalCount as 'Total Suggestions', acceptanceRate as 'Acceptance Rate' sort: totalCount desc limit: 20 # Initial page size, "Load more" button for additional rows ``` This query returns CodeSuggestion analytics grouped by programming language with time-series trends, showing acceptance rates and usage patterns for the last 30 days. ## GraphQL Query Structure Introduces a two-level structure: 1. Outer field (`duoCodeSuggestions`) - accepts filter arguments 2. Inner field (`aggregated`) - accepts ordering and pagination arguments ### Full Example Query ```graphql query CodeSuggestionsAnalytics { project(fullPath: "gitlab-org/gitlab") { analytics { duoCodeSuggestions( # Filter arguments at outer level userId: [123, 456] language: ["ruby", "javascript"] ideName: ["vscode", "jetbrains"] timestampFrom: "2026-01-01T00:00:00Z" timestampTo: "2026-03-31T23:59:59Z" ) { aggregated( # Ordering and pagination at inner level orderBy: [ { identifier: "acceptanceRate", direction: DESC } { identifier: "totalCount", direction: DESC } ] first: 20 ) { count # Total number of aggregated rows nodes { # Dimensions (group by fields) dimensions { user { id username name } language ideName timestamp(granularity: "monthly") # Can be "daily", "weekly", "monthly" } # Metrics (aggregated values) totalCount # Total number of suggestions shownCount # Number of shown suggestions acceptedCount # Number of accepted suggestions rejectedCount # Number of rejected suggestions acceptanceRate # Acceptance rate (accepted / shown) suggestionSizeSum # Total suggestions volume usersCount # Number of unique users } pageInfo { hasNextPage hasPreviousPage startCursor endCursor } } } } } } ``` ### Key Schema Details Filter Arguments (outer codeSuggestions field): - `userId: [Int!]` - Filter by one or many user IDs - `language: [String!]` - Filter by suggestion language - `ideName: [String!]` - Filter by IDE name - `timestampFrom: Time` - Start of time range - `timestampTo: Time` - End of time range Available Dimensions: - `user` - Returns full UserCore object - `language` - Programming language string - `ideName` - IDE name string - `timestamp(granularity: String)` - Date with granularity (daily/weekly/monthly) Available Metrics: - `totalCount` - Total suggestions - `shownCount` - Shown suggestions - `acceptedCount` - Accepted suggestions - `rejectedCount` - Rejected suggestions - `acceptanceRate` - Calculated rate (float) - `suggestionSizeSum` - Total volume - `usersCount` - Unique user count ### Simplified Example (Group by Language) ```graphql query SuggestionsByLanguage { project(fullPath: "gitlab-org/gitlab") { analytics { DuoCodeSuggestions( timestampFrom: "2026-02-01T00:00:00Z" timestampTo: "2026-02-29T23:59:59Z" ) { aggregated( orderBy: [{ identifier: "acceptanceRate", direction: DESC }] first: 10 ) { nodes { dimensions { language } totalCount acceptanceRate usersCount } } } } } } ``` ## Goals 1. Add GLQL parser support for `type = CodeSuggestion` + `mode: analytics` 2. Enable querying CodeSuggestion metrics (acceptance rate, usage by language/IDE, etc.) 3. Make CodeSuggestion analytics accessible to Data Analyst agent 4. Document query patterns and examples ## Scope **✅ In Scope:** - GLQL parser support for `type = CodeSuggestion` + `mode: analytics` - GLQL UI integration for CodeSuggestion queries - Documentation: GLQL syntax examples, query patterns - Data Analyst agent integration (prompts, example questions) - Feature flag: `glql_code_suggestion_analytics_aggregation`, type: `gitlab_com_derisk`, remove flag in same milestone **❌ Out of Scope (Already Done or Separate Work):** - Backend aggregation engine (✅ Done in !224204) - ClickHouse table and materialized views (✅ Done in !224204) - GraphQL field mounting (🔄 In progress in #589608) - Additional metrics beyond existing engine capabilities - Migrating existing Duo dashboards **✅ Prerequisites (In Progress):** - #589608 GraphQL mounting must be completed first ## Related Work - Backend Engine: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/224204 - GraphQL Mounting: https://gitlab.com/gitlab-org/gitlab/-/work_items/589608 - Research: https://gitlab.com/gitlab-org/gitlab/-/work_items/588589 - Parent Epic: https://gitlab.com/groups/gitlab-org/-/work_items/21207 - Aggregation Framework Docs: https://docs.gitlab.com/development/database/aggregation_framework/ - Visualization Research: https://gitlab.com/gitlab-org/gitlab/-/work_items/589575 ## Visualization Work Breakdown CodeSuggestion analytics requires 7+1 visualization features for full SDLC dashboard migration. These are **GLQL-level features** that work across all data types once implemented. ### 18.11 GA (Priority 1) - #592262 - **Tables** (weight 2) ✅ Critical path for Data Analyst GA - Default display type - Feature flag: `glql_code_suggestion_analytics_aggregation` (gates data access) ### Post-18.11 Visualization Features (SDLC Dashboard Migration) Each visualization feature is **GLQL-level** with its own feature flag, reusable across all 7 data types: 1. **#592780 - Stats** (weight 2) - Single metric display (e.g., "Total Suggestions: 1,234") - Feature flag: `glql_stat_display_type` - Use case: Usage metrics, acceptance rate display 2. **#592781 - Sparklines** (weight 2) - Inline trend charts in table cells - Feature flag: `glql_sparkline_display_type` - Use case: 7-day usage trends, acceptance rate trends - Syntax: `displayConfig.showSparklines: true` 3. **#592782 - Bar Charts (Horizontal)** (weight 2) - Horizontal bar visualization - Feature flag: `glql_barchart_display_type` - Use case: Acceptance by language, usage by IDE 4. **#592784 - Column Charts (Vertical)** (weight 2) - Vertical bar/column visualization - Feature flag: `glql_columnchart_display_type` - Use case: Time-series with vertical bars, categorical data 5. **#592783 - Area Charts** (weight 2) - Time-series area visualization - Feature flag: `glql_areachart_display_type` - Use case: Volume trends over time, stacked metrics 6. **#592912 - showTrends** (weight 2) - Percentage change indicators (works with stats AND tables) - Feature flag: `glql_trends_display_type` - Use case: Period-over-period comparisons - Syntax: `displayConfig.showTrends: true` - Shows "▲ 12.5%" or "▼ 8.3%" comparing last two time periods 7. **#592915 - Table Pagination** (weight 2) - "Load more" button for tables - Feature flag: `glql_table_pagination` - Use case: Loading additional rows beyond initial limit - Uses GraphQL cursor-based pagination **Total visualization weight:** 16 points (2 in 18.11 GA, 14 post-18.11) ### Why This Matters for Scaling Once these 7+1 visualization features are complete for CodeSuggestion, scaling to the other 6 data types becomes trivial: **For CodeSuggestion (first data type):** - Backend engine + GraphQL (Optimize team) - GLQL parser + UI (weight 3) - 7+1 visualization features (weight 16) ← **One-time cost** - **Total:** ~19 points **For each additional data type (Pipeline, MergeRequest, etc.):** - Backend engine + GraphQL (Optimize team) - GLQL parser + UI (weight 3) - Visualization verification (weight 1) ← **Just verify existing features work** - **Total:** ~4 points **Savings:** By building reusable GLQL-level visualizations, we save 15 points per data type. For 6 additional data types, that's **90 points saved** (15 × 6). ### Feature Flag Strategy **Data Access Flags** (per data type, gates backend queries): - `glql_code_suggestion_analytics_aggregation` - `glql_pipeline_analytics_aggregation` - `glql_merge_request_analytics_aggregation` - etc. **Display Feature Flags** (GLQL-level, reusable across all data types): - `glql_stat_display_type` - `glql_sparkline_display_type` - `glql_barchart_display_type` - `glql_columnchart_display_type` - `glql_areachart_display_type` - `glql_trends_display_type` - `glql_table_pagination` All flags use `gitlab_com_derisk` type. ## Success Criteria - [ ] GLQL queries with `mode: analytics` successfully return CodeSuggestion aggregation data - [ ] Data Analyst agent can answer questions like "What's the acceptance rate for Python suggestions?" - [ ] Documentation published with working examples - [ ] Pattern reusable for other aggregation types ## Implementation Flow ```mermaid graph TB subgraph "Per data type process" subgraph "Backend (Optimize)" ENGINE[1. Backend Engine Implement aggregation Weight: 3] GRAPHQL[2. GraphQL Field Mount engine Weight: 2] end subgraph "GLQL Integration (Platform Insights)" PARSER[3. Parser Support Add analytics mode Weight: 3] subgraph "Parallel Work" UI[4. UI Integration Render results Weight: 3] DOCS[5. Documentation Examples & guides Weight: 2] AGENT[6. Agent Integration Prompts & patterns Weight: 2] end end ENGINE --> GRAPHQL GRAPHQL -.Schema Handoff.-> PARSER PARSER --> UI PARSER --> DOCS PARSER --> AGENT end ``` **Implementation flow for this data type:** Sequential backend work by Optimize team, then parallel GLQL integration by Platform Insights team. The dotted "Schema Handoff" line shows where early schema sharing can unblock Platform Insights to start parser work before GraphQL is fully complete.

epic