Replace Rugged with custom validation in GitRefValidator

What does this MR do and why?

Contributes to #525489 (closed), #430709

Problem

Gitlab::GitRefValidator uses Rugged::Reference.valid_name? which has a bug - it incorrectly allows the DEL character (\x7F) which Git itself rejects. This causes confusing error messages when users try to create tags with this character, as the validator passes but Git operations fail later.

Solution

Replace Rugged dependency with a custom pure-Ruby implementation that correctly validates ref names according to Git's git-check-ref-format rules. The implementation now properly rejects:

  • Control characters (bytes < 0x20 and DEL 0x7F)
  • Components ending with .lock
  • All other invalid patterns per Git specification

Note: The custom validation operates on unqualified refs (e.g., branch-name) that get expanded to qualified refs (e.g., refs/heads/branch-name) before Git operations. Therefore, rule 9 (cannot be single @) doesn't apply at this level since refs/heads/@ is not a single @.

The change is gated behind the git_ref_validator_custom_validation feature flag (disabled by default) to allow for safe rollout.

Feature Flag

This MR introduces a feature flag git_ref_validator_custom_validation:

  • Disabled (default): Uses legacy Rugged validation (current behavior)
  • Enabled: Uses new custom validation that fixes the Rugged bug

References

Benchmarks

ruby -I lib -r bundler/setup git_ref_validator_benchmark.rb
============================================================
GitRefValidator Benchmark: Rugged vs Custom Implementation
============================================================
Test cases: 16 refs (mix of valid/invalid)
Iterations: 10000
Total validations: 160000
------------------------------------------------------------
Warming up... done

Benchmark Results:
------------------------------------------------------------
                             user     system      total        real
Rugged (legacy):         0.044997   0.000188   0.045185 (  0.045185)
Custom (byte-level):     0.081115   0.000204   0.081319 (  0.081331)

------------------------------------------------------------
Per-call timing:
------------------------------------------------------------
  Rugged (C extension): 0.282 µs/call
  Custom (pure Ruby):   0.508 µs/call
  Ratio:                1.80x

Absolute performance:
  - 1,000 validations:  0.51 ms
  - 10,000 validations: 5.08 ms

How to reproduce

  1. Switch to this branch
  2. Create a file git_ref_validator_benchmark.rb in GitLab project root with benchmark content (see below)
  3. Run benchmark ruby -I lib -r bundler/setup git_ref_validator_benchmark.rb

Benchmark source code

Click to expand
# frozen_string_literal: true

# GitRefValidator Benchmark: Rugged vs Custom Implementation
#
# Compares performance of the legacy Rugged-based validation against
# the new pure-Ruby byte-level implementation.
#
# Run from GitLab root directory:
#   ruby -I lib -r bundler/setup git_ref_validator_benchmark.rb

require 'benchmark'
require 'rugged'
require 'gitlab/git_ref_validator'

# Mock Feature module for standalone execution
module Feature
  class << self
    attr_accessor :_enabled

    def enabled?(*)
      _enabled
    end

    def current_request
      nil
    end
  end
end

# Mix of valid and invalid refs representing real-world usage
TEST_CASES = [
  # Valid refs
  'feature/new',
  'my-branch',
  'implement_@all',
  'feature/refs/heads/foo',
  'v1.0.0',
  'release/2.0',
  'hotfix/bug-123',
  'master{yesterday}',
  # Invalid refs
  'feature//new',
  'feature..branch',
  '-branch',
  'branch.lock',
  '@',
  "test\x7fbranch",
  "test\x00branch",
  'feature new'
].freeze

ITERATIONS = 10_000

puts '=' * 60
puts 'GitRefValidator Benchmark: Rugged vs Custom Implementation'
puts '=' * 60
puts "Test cases: #{TEST_CASES.length} refs (mix of valid/invalid)"
puts "Iterations: #{ITERATIONS}"
puts "Total validations: #{ITERATIONS * TEST_CASES.length}"
puts '-' * 60

# Warmup
print 'Warming up...'
100.times do
  TEST_CASES.each do |ref|
    Feature._enabled = false
    Gitlab::GitRefValidator.validate(ref)
    Feature._enabled = true
    Gitlab::GitRefValidator.validate(ref)
  end
end
puts ' done'

puts "\nBenchmark Results:"
puts '-' * 60

results = {}

Benchmark.bm(22) do |x|
  results[:rugged] = x.report('Rugged (legacy):') do
    Feature._enabled = false
    ITERATIONS.times do
      TEST_CASES.each { |ref| Gitlab::GitRefValidator.validate(ref) }
    end
  end

  results[:custom] = x.report('Custom (byte-level):') do
    Feature._enabled = true
    ITERATIONS.times do
      TEST_CASES.each { |ref| Gitlab::GitRefValidator.validate(ref) }
    end
  end
end

# Summary
total_calls = ITERATIONS * TEST_CASES.length
rugged_us = (results[:rugged].real / total_calls) * 1_000_000
custom_us = (results[:custom].real / total_calls) * 1_000_000
ratio = custom_us / rugged_us

puts "\n" + ('-' * 60)
puts 'Per-call timing:'
puts '-' * 60
puts format('  Rugged (C extension): %<time>.3f µs/call', time: rugged_us)
puts format('  Custom (pure Ruby):   %<time>.3f µs/call', time: custom_us)
puts format('  Ratio:                %<ratio>.2fx', ratio: ratio)

puts "\nAbsolute performance:"
puts format('  - 1,000 validations:  %<time>.2f ms', time: custom_us * 1000 / 1000)
puts format('  - 10,000 validations: %<time>.2f ms', time: custom_us * 10_000 / 1000)

How to set up and validate locally

  1. Enable the feature flag:

    Feature.enable(:git_ref_validator_custom_validation)
  2. Test the validation in Rails console:

    # DEL character - now correctly rejected (previously allowed by Rugged)
    Gitlab::GitRefValidator.validate("\x7f")           # => false
    Gitlab::GitRefValidator.validate("test\x7fbranch") # => false
    
    # Single @ is valid (becomes refs/heads/@ which is allowed)
    Gitlab::GitRefValidator.validate("@")              # => true
  3. Run the tests:

    bundle exec rspec spec/lib/gitlab/git_ref_validator_spec.rb

MR acceptance checklist

Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Vasilii Iakliushin

Merge request reports

Loading