Draft: Refactor Builder methods by collecting all variables first

What does this MR do and why?

This MR optimises variable collection performance in CI builds by introducing a new Collection.from_collections method and refactoring variable builders to use batch construction instead of multiple concatenation operations.

Performance Problem:

  1. Reduced method call overhead - eliminates multiple concat calls.
  2. More efficient memory allocation - single array allocation vs. multiple appends.
  3. Better cache locality - single pass through data vs. multiple iterations.

Solution:

  • Added Collection.from_collections(*collections) class method for collecting the variables and instantiating a hash once all variables are collected.

Impact:

  • Replaces multiple concat() calls with a single from_collections() method for more efficient variable collection building.
  • Delivers 10-13% performance improvements across different CI variable scenarios.
  • Reduces method call overhead and improves memory allocation patterns while maintaining identical functionality and variable precedence.

References

While working on #565862, I identified performance bottlenecks in CI variable collection methods due to inefficient hash operations. Based on profiling data (internal link), the following methods showed significant execution time:

  • Gitlab::Ci::Variables::Builder#scoped_variables -> This MR refactors scoped_variables, unprotected_scoped_variables, scoped_variables_for_pipeline_seed methods. The scoped_variables is the most problematic according to our logs, nevertheless, the other two methods are also using the old pattern so I have refactored them as well.
  • Gitlab::Ci::Variables::Builder::Pipeline#predefined_variables -> This method as well as other variable methods that we have, that use the same pattern, should be refactored in the followup MRs. I will create those once proposed solution here is accepted and this MR is merged.

Most methods that process variables use the concat pattern, I believe we could use the from_collections method pattern in other classes/methods.

In Gitlab::Ci::Variables::Builder, there are multiple methods that follow similar pattern and could benefit from this optimisation. I've focused on selected methods for now, I would like get some feedback on this approach before refactoring additional variable assembly methods.

Screenshots or screen recordings

Before After
-> Screenshot 2025-09-22 at 11.27.33.png

How to set up and validate locally

  1. Create a project with multiple variable levels (instance, group, project, pipeline).
  2. Add 20+ variables at each level.
  3. Run a CI job and verify variables are resolved correctly.

Performance Benchmark Analysis

After temporarily adding below script to your local, you can run this benchmark testing via RAILS_ENV=development rails runner scripts/ci_variables_performance_benchmark.rb

To run the below benchmark script:

  1. Add scripts/ci_variables_performance_benchmark.rb file into your scrips and paste below code.
  2. run RAILS_ENV=development rails runner scripts/ci_variables_performance_benchmark.rb
# CI Variables Collection Performance Benchmark

require 'benchmark'

ActiveRecord::Base.logger = nil

def create_variable_collections(source_count, vars_per_source)
  prefixes = %w[CI_ PROJECT_ GITLAB_ RUNNER_ ENV_]

  source_count.times.map do |source_index|
    variables = vars_per_source.times.map do |var_index|
      prefix = prefixes[source_index % prefixes.size]
      { key: "#{prefix}VAR_#{var_index + 1}", value: "value_#{source_index + 1}_#{var_index + 1}" }
    end
    Gitlab::Ci::Variables::Collection.new(variables)
  end
end

def run_original_approach(collections)
  Gitlab::Ci::Variables::Collection.new.tap do |variables|
    collections.each { |collection| variables.concat(collection) }
  end
end

def run_new_approach(collections)
  Gitlab::Ci::Variables::Collection.from_collections(*collections)
end

def run_benchmark(iterations, &block)
  # Warmup
  3.times(&block)

  # Measure 5 runs
  times = []
  5.times do
    # Garbage collection
    GC.disable
    time = Benchmark.realtime { iterations.times(&block) }
    GC.enable
    GC.start
    times << time
    print "."
  end
  puts " #{times.sum / times.size * 1000}ms avg"

  times.sum / times.size
end

def show_results(original_time, new_time)
  percentage_change = ((new_time - original_time) / original_time * 100).round(1)

  if percentage_change < 0
    puts "--> #{percentage_change.abs}% faster (#{(original_time / new_time).round(2)}x speedup)"
  else
    puts "--> #{percentage_change}% slower"
  end
end

scenarios = [
  { name: "Small Project", sources: 8, vars_per_source: 10 },
  { name: "Medium Project", sources: 9, vars_per_source: 15 },
  { name: "Large Project", sources: 10, vars_per_source: 25 },
  { name: "Enterprise Project", sources: 10, vars_per_source: 40 }
]

# Fixed iteration count for all scenarios
ITERATIONS = 50

scenarios.each do |scenario|
  puts "\n#{scenario[:name]} (#{scenario[:sources]} × #{scenario[:vars_per_source]} = #{scenario[:sources] * scenario[:vars_per_source]} vars)"

  collections = create_variable_collections(scenario[:sources], scenario[:vars_per_source])

  print "Original approach: "
  original_time = run_benchmark(ITERATIONS) do
    run_original_approach(collections)
  end

  print "New approach: "
  new_time = run_benchmark(ITERATIONS) do
    run_new_approach(collections)
  end

  show_results(original_time, new_time)
end

Results:

Screenshot 2025-09-22 at 11.27.33.png

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Kasia Misirli

Merge request reports

Loading