Implement CodeSuggestions analytics source

Summary

This MR implements the CodeSuggestions analytics source for GLQL, enabling users to query GitLab Duo Code Suggestions usage data using natural language syntax. This builds on the analytics infrastructure from !347 (merged) by adding the first production analytics source.

What Changed

1. CodeSuggestions Source Implementation

Added new analytics-only source for Code Suggestions data:

  • New source: CodeSuggestionsSourceAnalyzer in src/analyzer/sources/code_suggestions.rs
    • Analytics-only mode (no standard queries)
    • GraphQL path: analytics.duoCodeSuggestions
    • Requires glql_code_suggestion_analytics_aggregation feature flag
  • SourceAnalyzer trait implementation:
    • field_category() - categorizes 4 dimensions and 7 metrics
    • analytics_dimension_key() - provides GraphQL field names for dimensions
    • supported_modes() - declares Analytics mode only
    • parent_scope() - returns "analytics" for nested GraphQL path

2. Field Definitions

4 Dimensions (groupable fields):

  • language - Programming language
  • ideName - IDE/editor name
  • timestamp - Timestamp of the suggestion event
  • user - User who received the suggestion

7 Metrics (aggregatable fields):

  • totalCount - Total number of suggestions
  • usersCount - Number of unique users
  • acceptanceRate - Percentage of accepted suggestions
  • suggestionSizeSum - Total size of all suggestions
  • acceptedCount - Number of accepted suggestions
  • rejectedCount - Number of rejected suggestions
  • shownCount - Number of suggestions shown

3. Filter Mappings

Custom GraphQL parameter mappings for backend compatibility:

  • useruserId (with array wrapping: user = 123userId: [123])
  • timestamp >=timestampFrom
  • timestamp <=timestampTo
  • Other filters use field name directly (language, ideName)

Architecture

Query Flow

  1. Parser accepts CodeSuggestion type with analytics mode
  2. Analyzer validates feature flag and categorizes fields
  3. Validator ensures dimensions are valid for this source
  4. Code Generator produces nested GraphQL query with orderBy

Generated GraphQL Structure

query GLQL($before: String, $after: String, $limit: Int) {
  project(fullPath: "gitlab-org/gitlab") {
    analytics {
      duoCodeSuggestions(
        language: "ruby",
        userId: ["gid://gitlab/User/123"],
        timestampFrom: "2024-01-01 00:00"
      ) {
        aggregated(
          before: $before,
          after: $after,
          first: $limit,
          orderBy: [{ identifier: "acceptanceRate", direction: DESC }]
        ) {
          nodes {
            dimensions {
              language
              ideName
            }
            totalCount
            acceptanceRate
            acceptedCount
          }
          pageInfo {
            hasNextPage
            hasPreviousPage
            startCursor
            endCursor
          }
        }
      }
    }
  }
}

Example Usage

Basic aggregation by language

mode: analytics
query: type = CodeSuggestion and timestamp >= -30d
dimensions: language
metrics: totalCount, acceptanceRate
sort: totalCount desc

User-specific analysis

mode: analytics
query: type = CodeSuggestion and user = @rob.hunt and timestamp >= -7d
dimensions: language, ideName
metrics: totalCount, acceptedCount, rejectedCount, acceptanceRate
sort: acceptanceRate desc
limit: 10

Multi-language comparison

mode: analytics
query: type = CodeSuggestion and language in ("ruby", "javascript", "python") and timestamp >= "2024-01-01"
dimensions: language, ideName
metrics: totalCount, usersCount, acceptanceRate
sort: acceptanceRate desc

Testing

This MR can be tested in two ways:

  1. Standalone GLQL Testing - Verify query compilation without backend integration
  2. End-to-End Testing - Full integration testing with MR gitlab!228129 (closed)

Most reviewers should use Approach 1 for code review. Approach 2 provides comprehensive validation but requires additional setup.

Standalone GLQL testing

note: For my own personal testing I have been using a script so I can monitor different modes compilation and transformation. Feel free if you'd like to use it too: test_analytics.rb. Copy it to the ./glql_rb directory and run with bundle exec ruby test_analytics.rb.

  1. Build the ruby extension:

    cd glql_rb
    bundle install
    bundle exec rake compile
  2. Use a test script to test the output:

    ruby -I lib -r gitlab_query_language -e "
    query = 'type = CodeSuggestion and language = \"ruby\"'
    config = { 
      project: 'gitlab-org/gitlab',
      mode: 'analytics',
      dimensions: 'language, ideName',
      metrics: 'totalCount, acceptanceRate',
      featureFlags: { glqlCodeSuggestions: true }
    }
    result = Glql.compile(query, config)
    puts '=== Compilation Success: ' + result['success'].to_s
    puts '=== Field Count: ' + result['fields'].length.to_s
    puts ''
    puts '=== Generated GraphQL (excerpt):'
    puts result['output'].lines[0..15].join
    "
  3. Test different query patterns: Multi-language comparison:

    ruby -I lib -r gitlab_query_language -e "
    result = Glql.compile(
      'type = CodeSuggestion and language in (\"ruby\", \"javascript\", \"python\")',
      { 
        project: 'gitlab-org/gitlab',
        mode: 'analytics',
        dimensions: 'language, ideName',
        metrics: 'totalCount, acceptanceRate, acceptedCount',
        featureFlags: { glqlCodeSuggestions: true }
      }
    )
    puts 'Success: ' + result['success'].to_s
    puts 'Contains language filter: ' + result['output'].include?('language:').to_s
    "

    User-specific analysis:

    ruby -I lib -r gitlab_query_language -e "
    result = Glql.compile(
      'type = CodeSuggestion and user = 123',
      { 
        project: 'gitlab-org/gitlab',
        mode: 'analytics',
        dimensions: 'language',
        metrics: 'totalCount, usersCount',
        featureFlags: { glqlCodeSuggestions: true }
      }
    )
    puts 'Success: ' + result['success'].to_s
    puts 'Contains userId: ' + result['output'].include?('userId:').to_s
    "

    Time range filter:

    ruby -I lib -r gitlab_query_language -e "
    result = Glql.compile(
      'type = CodeSuggestion and timestamp >= \"2024-01-01\" and timestamp <= \"2024-12-31\"',
      { 
        project: 'gitlab-org/gitlab',
        mode: 'analytics',
        dimensions: 'language',
        metrics: 'totalCount',
        featureFlags: { glqlCodeSuggestions: true }
      }
    )
    puts 'Success: ' + result['success'].to_s
    puts 'Contains timestampFrom: ' + result['output'].include?('timestampFrom:').to_s
    puts 'Contains timestampTo: ' + result['output'].include?('timestampTo:').to_s
    "
  4. Test standard mode is rejected by CodeSuggestion:

    ruby -I lib -r gitlab_query_language -e "
    result = Glql.compile(
      'type = CodeSuggestion and language = \"ruby\"',
      { 
        project: 'gitlab-org/gitlab',
        fields: 'language, totalCount',
        featureFlags: { glqlCodeSuggestions: true }
      }
    )
    if result['success']
      puts 'ERROR: Should have failed but succeeded'
    else
      puts 'Correctly rejected standard mode'
      puts 'Error message: ' + result['output']
    end
    "
  5. Test feature flag disables CodeSuggestion:

    ruby -I lib -r gitlab_query_language -e "
    result = Glql.compile(
      'type = CodeSuggestion and language in (\"ruby\", \"javascript\", \"python\")',
      { 
        project: 'gitlab-org/gitlab',
        mode: 'analytics',
        dimensions: 'language, ideName',
        metrics: 'totalCount, acceptanceRate, acceptedCount',
        featureFlags: { glqlCodeSuggestions: false }
      }
    )
    if result['success']
      puts 'ERROR: Should have failed but succeeded'
    else
      puts 'Correctly rejected due to feature flag'
      puts 'Error message: ' + result['output']
    end
    "
  6. Test transformation step:

    ruby -I lib -r gitlab_query_language -e "
    # Mock GraphQL response from CodeSuggestions analytics API
    response = {
      project: {
        analytics: {
          duoCodeSuggestions: {
            aggregated: {
              nodes: [
                {
                  dimensions: {
                    language: 'ruby',
                    ideName: 'vscode'
                  },
                  totalCount: 150,
                  acceptanceRate: 0.75
                }
              ],
              pageInfo: {
                hasNextPage: true,
                hasPreviousPage: false
              }
            }
          }
        }
      }
    }
    result = Glql.transform(
      response,
      {
        mode: 'analytics',
        fields: [
          { name: 'language',       type: 'dimension' },
          { name: 'ideName',        type: 'dimension' },
          { name: 'totalCount',     type: 'metric' },
          { name: 'acceptanceRate', type: 'metric' }
        ]
      }
    )
    node = result['data']['nodes'][0]
    puts 'Transform Success: ' + result['success'].to_s
    puts 'Dimensions Flattened: ' + (!node.key?('dimensions')).to_s
    puts 'Language: ' + node['language']
    puts 'IDE: ' + node['ideName']
    puts 'Total Count: ' + node['totalCount'].to_s
    puts 'Acceptance Rate: ' + node['acceptanceRate'].to_s
    "

End-to-end GLQL testing

Follow the testing instructions in Add CodeSuggestions support to GLQL in GitLab UI (gitlab!228129 - closed)

  • Depends on: !347 (merged) (analytics infrastructure)
  • Backend GraphQL API: gitlab!226274 (merged) (expose-code-suggestions-ae-to-graphql)
  • Feature flag: glql_code_suggestion_analytics_aggregation (backend-controlled)
  • Next steps: !349 (closed) (timeSegment function support)

Related to #95 (closed)

Edited by Robert Hunt

Merge request reports

Loading