Add analytics aggregation support (!347) · Merge requests · GitLab.org / GLQL

Summary

This MR adds generic analytics mode support for aggregation-based queries, enabling any source to support analytics with dimensions (groupable fields) and metrics (aggregatable fields). The implementation has been refined from the original !347 (merged) to focus purely on core infrastructure without feature-specific code.

What Changed

1. Core Analytics Infrastructure

Added the foundational analytics query mode:

QueryMode enum: Differentiates between Standard and Analytics query types
SourceAnalyzer trait extensions:
- field_category() - categorizes fields as Dimension/Metric/Standard
- parent_scope() - supports nested GraphQL paths (e.g., aiUsage { codeSuggestions })
- supported_modes() - declares which modes a source supports
Analytics code generation: Generates aggregated(dimensions: [...]) { nodes { ... metrics } } queries
Mode validation: New error types for analytics-specific validation:
- UnsupportedModeForSource - when a source doesn't support analytics mode
- InvalidFieldForAnalytics - when standard fields are used in dimensions/metrics
- InvalidFunctionArgument - for function argument validation

2. GraphQL Code Refactoring

Modularized the GraphQL code generation for better separation of concerns:

Before: Single 478-line graphql.rs mixing standard and analytics logic After: Clean modular structure

src/codegen/graphql.rs (80 lines) - Entry point + shared helpers
src/codegen/graphql/analytics.rs (252 lines) - Analytics-specific generation
src/codegen/graphql/standard.rs (96 lines) - Standard query generation

Key improvements:

Separated render_display_field() shared helper for both modes
Client-side filter functions (labels(), assignees(), author(), milestone()) properly rejected in analytics dimensions/metrics
Analytics mode validates at least one metric is selected
Better error messages explaining correct usage

3. Response Transformation for Analytics

Added automatic flattening of analytics response structures:

New flatten_analytics_dimensions() function in src/analyzer/sources/shared.rs:

Flattens nested dimension structures to top-level fields
Transforms { dimensions: { language: "ruby" }, metric: 10 } into { language: "ruby", metric: 10 }
Makes analytics responses consistent with standard query output format
Comprehensive test coverage including edge cases

Enhanced transform_response() in SourceAnalyzer trait:

Detects analytics mode via parent_scope() == Some("analytics")
Automatically applies dimension flattening for analytics queries
Preserves backward compatibility with standard mode responses

Updated transform_for_data_source() in src/transformer/data.rs:

Checks for analytics scope before standard mode scope
Routes analytics data through source analyzer's transform_response()
Added tests for analytics transformations across different sources

4. GraphQL String Escaping

Added security improvements for GraphQL query generation:

New utils::graphql::escape_string() function with single-pass optimization
Properly escapes quotes, backslashes, and control characters (\n, \t, \r, \b, \f)
Applied to all user-provided strings in GraphQL queries (namespace, project, group paths)
Comprehensive test coverage including edge cases

Architecture

Two Query Modes

Standard Mode (existing):

workItems(labelName: "bug") {
  nodes { id title labels { nodes { title } } }
}

Analytics Mode (new):

analytics {
  duoCodeSuggestions(language: "ruby") {
    aggregated(
      before: $before
      after: $after
      first: $limit
    ) {
      nodes {
        dimensions {
          language
          ideName
        }
        totalCount
        acceptanceRate
        acceptedCount
      }
    }
  }
}

Field Categories

Standard: Regular fields like title, description - only for display/filtering
Dimension: Groupable fields like assignees, labels, milestone - can aggregate by these
Metric: Aggregatable fields like count, weight - calculated values in aggregations

Code Generation Flow

Parser accepts dimensions and metrics parameters with mode: analytics
Analyzer validates mode support and categorizes fields via field_category()
Code Generator routes to analytics or standard generator based on mode
Analytics Generator:
- Partitions fields into dimensions vs metrics
- Validates at least one metric is selected
- Generates aggregated(dimensions: [...]) wrapper
- Builds proper GraphQL structure with parent scopes
Transformer flattens dimension structures in the response data

Testing

Pipelines should pass. For testing an analytics mode data type, I would recommend testing Implement CodeSuggestions analytics source (!348 - merged)

What This Enables

This infrastructure is generic and reusable - any source can now support analytics by:

Implementing field_category() to mark dimensions/metrics
Adding QueryMode::Analytics to supported_modes()
Optionally implementing parent_scope() for nested paths

Next Steps

MR !348 (merged): CodeSuggestions source implementation (uses this infrastructure)
MR !349 (closed): Add timeSegment() function support (builds on this)
MR gitlab!228129 (closed): Add CodeSuggestions to GitLab UI

Related to #95 (closed)

Edited Mar 20, 2026 by Robert Hunt

Add analytics aggregation support