Add analytics aggregation support
Summary
This MR adds generic analytics mode support for aggregation-based queries, enabling any source to support analytics with dimensions (groupable fields) and metrics (aggregatable fields). The implementation has been refined from the original !347 (merged) to focus purely on core infrastructure without feature-specific code.
What Changed
1. Core Analytics Infrastructure
Added the foundational analytics query mode:
-
QueryMode enum: Differentiates between
StandardandAnalyticsquery types -
SourceAnalyzer trait extensions:
-
field_category()- categorizes fields as Dimension/Metric/Standard -
parent_scope()- supports nested GraphQL paths (e.g.,aiUsage { codeSuggestions }) -
supported_modes()- declares which modes a source supports
-
-
Analytics code generation: Generates
aggregated(dimensions: [...]) { nodes { ... metrics } }queries -
Mode validation: New error types for analytics-specific validation:
-
UnsupportedModeForSource- when a source doesn't support analytics mode -
InvalidFieldForAnalytics- when standard fields are used in dimensions/metrics -
InvalidFunctionArgument- for function argument validation
-
2. GraphQL Code Refactoring
Modularized the GraphQL code generation for better separation of concerns:
Before: Single 478-line graphql.rs mixing standard and analytics logic
After: Clean modular structure
-
src/codegen/graphql.rs(80 lines) - Entry point + shared helpers -
src/codegen/graphql/analytics.rs(252 lines) - Analytics-specific generation -
src/codegen/graphql/standard.rs(96 lines) - Standard query generation
Key improvements:
- Separated
render_display_field()shared helper for both modes - Client-side filter functions (
labels(),assignees(),author(),milestone()) properly rejected in analytics dimensions/metrics - Analytics mode validates at least one metric is selected
- Better error messages explaining correct usage
3. Response Transformation for Analytics
Added automatic flattening of analytics response structures:
New flatten_analytics_dimensions() function in src/analyzer/sources/shared.rs:
- Flattens nested dimension structures to top-level fields
- Transforms
{ dimensions: { language: "ruby" }, metric: 10 }into{ language: "ruby", metric: 10 } - Makes analytics responses consistent with standard query output format
- Comprehensive test coverage including edge cases
Enhanced transform_response() in SourceAnalyzer trait:
- Detects analytics mode via
parent_scope() == Some("analytics") - Automatically applies dimension flattening for analytics queries
- Preserves backward compatibility with standard mode responses
Updated transform_for_data_source() in src/transformer/data.rs:
- Checks for
analyticsscope beforestandardmode scope - Routes analytics data through source analyzer's
transform_response() - Added tests for analytics transformations across different sources
4. GraphQL String Escaping
Added security improvements for GraphQL query generation:
- New
utils::graphql::escape_string()function with single-pass optimization - Properly escapes quotes, backslashes, and control characters (
\n,\t,\r,\b,\f) - Applied to all user-provided strings in GraphQL queries (namespace, project, group paths)
- Comprehensive test coverage including edge cases
Architecture
Two Query Modes
Standard Mode (existing):
workItems(labelName: "bug") {
nodes { id title labels { nodes { title } } }
}
Analytics Mode (new):
analytics {
duoCodeSuggestions(language: "ruby") {
aggregated(
before: $before
after: $after
first: $limit
) {
nodes {
dimensions {
language
ideName
}
totalCount
acceptanceRate
acceptedCount
}
}
}
}
Field Categories
-
Standard: Regular fields like
title,description- only for display/filtering -
Dimension: Groupable fields like
assignees,labels,milestone- can aggregate by these -
Metric: Aggregatable fields like
count,weight- calculated values in aggregations
Code Generation Flow
-
Parser accepts
dimensionsandmetricsparameters withmode: analytics -
Analyzer validates mode support and categorizes fields via
field_category() - Code Generator routes to analytics or standard generator based on mode
-
Analytics Generator:
- Partitions fields into dimensions vs metrics
- Validates at least one metric is selected
- Generates
aggregated(dimensions: [...])wrapper - Builds proper GraphQL structure with parent scopes
- Transformer flattens dimension structures in the response data
Testing
Pipelines should pass. For testing an analytics mode data type, I would recommend testing Implement CodeSuggestions analytics source (!348 - merged)
What This Enables
This infrastructure is generic and reusable - any source can now support analytics by:
- Implementing
field_category()to mark dimensions/metrics - Adding
QueryMode::Analyticstosupported_modes() - Optionally implementing
parent_scope()for nested paths
Next Steps
- MR !348 (merged): CodeSuggestions source implementation (uses this infrastructure)
-
MR !349 (closed): Add
timeSegment()function support (builds on this) - MR gitlab!228129 (closed): Add CodeSuggestions to GitLab UI
Related to #95 (closed)