Add analytics aggregation support

Summary

This MR adds generic analytics mode support for aggregation-based queries, enabling any source to support analytics with dimensions (groupable fields) and metrics (aggregatable fields). The implementation has been refined from the original !347 (merged) to focus purely on core infrastructure without feature-specific code.

What Changed

1. Core Analytics Infrastructure

Added the foundational analytics query mode:

  • QueryMode enum: Differentiates between Standard and Analytics query types
  • SourceAnalyzer trait extensions:
    • field_category() - categorizes fields as Dimension/Metric/Standard
    • parent_scope() - supports nested GraphQL paths (e.g., aiUsage { codeSuggestions })
    • supported_modes() - declares which modes a source supports
  • Analytics code generation: Generates aggregated(dimensions: [...]) { nodes { ... metrics } } queries
  • Mode validation: New error types for analytics-specific validation:
    • UnsupportedModeForSource - when a source doesn't support analytics mode
    • InvalidFieldForAnalytics - when standard fields are used in dimensions/metrics
    • InvalidFunctionArgument - for function argument validation

2. GraphQL Code Refactoring

Modularized the GraphQL code generation for better separation of concerns:

Before: Single 478-line graphql.rs mixing standard and analytics logic After: Clean modular structure

  • src/codegen/graphql.rs (80 lines) - Entry point + shared helpers
  • src/codegen/graphql/analytics.rs (252 lines) - Analytics-specific generation
  • src/codegen/graphql/standard.rs (96 lines) - Standard query generation

Key improvements:

  • Separated render_display_field() shared helper for both modes
  • Client-side filter functions (labels(), assignees(), author(), milestone()) properly rejected in analytics dimensions/metrics
  • Analytics mode validates at least one metric is selected
  • Better error messages explaining correct usage

3. Response Transformation for Analytics

Added automatic flattening of analytics response structures:

New flatten_analytics_dimensions() function in src/analyzer/sources/shared.rs:

  • Flattens nested dimension structures to top-level fields
  • Transforms { dimensions: { language: "ruby" }, metric: 10 } into { language: "ruby", metric: 10 }
  • Makes analytics responses consistent with standard query output format
  • Comprehensive test coverage including edge cases

Enhanced transform_response() in SourceAnalyzer trait:

  • Detects analytics mode via parent_scope() == Some("analytics")
  • Automatically applies dimension flattening for analytics queries
  • Preserves backward compatibility with standard mode responses

Updated transform_for_data_source() in src/transformer/data.rs:

  • Checks for analytics scope before standard mode scope
  • Routes analytics data through source analyzer's transform_response()
  • Added tests for analytics transformations across different sources

4. GraphQL String Escaping

Added security improvements for GraphQL query generation:

  • New utils::graphql::escape_string() function with single-pass optimization
  • Properly escapes quotes, backslashes, and control characters (\n, \t, \r, \b, \f)
  • Applied to all user-provided strings in GraphQL queries (namespace, project, group paths)
  • Comprehensive test coverage including edge cases

Architecture

Two Query Modes

Standard Mode (existing):

workItems(labelName: "bug") {
  nodes { id title labels { nodes { title } } }
}

Analytics Mode (new):

analytics {
  duoCodeSuggestions(language: "ruby") {
    aggregated(
      before: $before
      after: $after
      first: $limit
    ) {
      nodes {
        dimensions {
          language
          ideName
        }
        totalCount
        acceptanceRate
        acceptedCount
      }
    }
  }
}

Field Categories

  • Standard: Regular fields like title, description - only for display/filtering
  • Dimension: Groupable fields like assignees, labels, milestone - can aggregate by these
  • Metric: Aggregatable fields like count, weight - calculated values in aggregations

Code Generation Flow

  1. Parser accepts dimensions and metrics parameters with mode: analytics
  2. Analyzer validates mode support and categorizes fields via field_category()
  3. Code Generator routes to analytics or standard generator based on mode
  4. Analytics Generator:
    • Partitions fields into dimensions vs metrics
    • Validates at least one metric is selected
    • Generates aggregated(dimensions: [...]) wrapper
    • Builds proper GraphQL structure with parent scopes
  5. Transformer flattens dimension structures in the response data

Testing

Pipelines should pass. For testing an analytics mode data type, I would recommend testing Implement CodeSuggestions analytics source (!348 - merged)

What This Enables

This infrastructure is generic and reusable - any source can now support analytics by:

  1. Implementing field_category() to mark dimensions/metrics
  2. Adding QueryMode::Analytics to supported_modes()
  3. Optionally implementing parent_scope() for nested paths

Next Steps

Related to #95 (closed)

Edited by Robert Hunt

Merge request reports

Loading