Fixing Gitlab/JsonSafeParse violations

What does this MR do and why?

Fixes Gitlab/JsonSafeParse on ee/app/services/ai/duo_workflows/start_workflow_service.rb .

Related to #586119

Data source identification

  • workflow_metadata comes from Gitlab::DuoWorkflow::Client.metadata() + modelmetada from Gitlab::DuoWorkflow::DuoWorkflowService::Client.agent_platform_model_metadata()
  • All data comes from internal services with no user input
Example of data struture:
{
  "extended_logging": true,
  "is_team_member": false,
  "modelMetadata": {
  "provider": "ai-provider",
  "name": "model-name",
  "endpoint": "http://localhost:11434/v1",
  "api_key": "service_token_xyz",
  "identifier": "provider/some-model"
  }
}

Giving it is internal data, can safe_parse be used?

Giving it is internal data and safe_parse is not actually needed, what would be the impact of using it, should parse be used instead?

  1. Data doesn't exceed the the safe_parse limits:
    • Typical data size: 150-300 bytes, nesting dept ~3
  2. Profiling on next section

Performance Profiling

Comprehensive benchmark comparing parse() vs safe_parse() with 10,000 iterations:

Test Case Data Size Parse Time Safe Parse Time Overhead
Small (basic metadata) 48 bytes 0.0025ms 0.0081ms 324%
Medium (typical) 720 bytes 0.0025ms 0.0084ms 236%
Large (full context) 15,971 bytes 0.0235ms 0.0605ms 157%

Production Load Scenarios

At 10,000 requests/second (high-traffic scenario):

Scenario Parse Total Safe Parse Total Difference
Small (48B) 0.025ms 0.081ms +0.056ms ✓
Medium (720B) 0.025ms 0.084ms +0.059ms ✓
Large (15.9KB) 0.235ms 0.605ms +0.370ms ✓

Source: benchmark.rb

#!/usr/bin/env ruby
# frozen_string_literal: true

# Condensed Performance Benchmark: parse vs safe_parse for StartWorkflowService
#
# This script compares the performance impact of switching from Gitlab::Json.parse()
# to Gitlab::Json.safe_parse() in StartWorkflowService#workflow_metadata
#
# Usage: bundle exec ruby benchmark_json_parse_compact.rb

require 'benchmark'
require 'json'
require_relative 'config/environment'

# Test data representing typical workflow_metadata sizes
TEST_DATA = {
  small: {
    extended_logging: true,
    is_team_member: false
  }.to_json,

  medium: {
    extended_logging: true,
    is_team_member: false,
    modelMetadata: {
      provider: 'ai-provider',
      name: 'model-name',
      endpoint: 'http://localhost:11434/v1',
      api_key: 'token' * 100,
      identifier: 'provider/some-model'
    }.to_json
  }.to_json,

  large: {
    extended_logging: true,
    is_team_member: false,
    modelMetadata: {
      provider: 'ai-provider',
      name: 'model-name',
      endpoint: 'http://localhost:11434/v1',
      api_key: 'token' * 100,
      identifier: 'provider/some-model',
      additional_context: { data: Array.new(50) { 'context_value' } }
    }.to_json,
    additional_context: Array.new(100) { |i| { category: "cat_#{i}", content: 'x' * 100 } }.to_json
  }.to_json
}

ITERATIONS = 10_000

puts "=" * 90
puts "Performance Benchmark: Gitlab::Json.parse vs Gitlab::Json.safe_parse"
puts "For: StartWorkflowService#workflow_metadata"
puts "=" * 90
puts

results = {}

TEST_DATA.each do |name, data|
  puts "\n#{name.to_s.capitalize} (#{data.bytesize} bytes, #{ITERATIONS} iterations)"
  puts "-" * 90

  Benchmark.bm(35) do |x|
    parse_time = x.report("parse()") do
      ITERATIONS.times { Gitlab::Json.parse(data) }
    end

    safe_parse_time = x.report("safe_parse()") do
      ITERATIONS.times { Gitlab::Json.safe_parse(data) }
    end

    overhead_ms = (safe_parse_time.real - parse_time.real) / ITERATIONS * 1000
    overhead_pct = ((safe_parse_time.real - parse_time.real) / parse_time.real) * 100

    results[name] = {
      parse_ms: parse_time.real / ITERATIONS * 1000,
      safe_parse_ms: safe_parse_time.real / ITERATIONS * 1000,
      overhead_ms: overhead_ms,
      overhead_pct: overhead_pct
    }
  end
end

Conclusion

Using safe_parse adds some overhead (~0.059ms), but considering this operation is done only once per workflow I believe it is acceptable to be used in this case, unless the responsible disagrees.

References

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Amanda Lins

Merge request reports

Loading