Skip to content

Strip control characters from scanner output

Igor Frenkel requested to merge 434468-sanitize-scanner-output into master

What does this MR do?

This MR strips the common control characters \t \r \n from scanner output.

String.tr is chosen over String.sub due to performance considerations. During some rough testing a speed difference of 50x was commonly observed:

require 'json'
require 'benchmark'

obj = {'foo' => []}
(0..100000).each do |i|
  obj['foo'] << "bar-#{i}"
end

json = obj.to_json.gsub('bar-', "bar-\t\r\n")

json_gsub = nil
json_dump = nil
Benchmark.bm do |x|
  x.report { json_gsub = json.gsub(/\t|\r|\n/,"") }
  x.report { json_dump = json.tr("\t\r\n",'') }
end

# ensure both strings are valid json
JSON.parse(json_gsub)
JSON.parse(json_dump)

Resulting output:

       user     system      total        real
   0.069201   0.002277   0.071478 (  0.073168)
   0.001240   0.000174   0.001414 (  0.001413)

What are the relevant issue numbers?

gitlab-org/gitlab#434468 (closed)

Does this MR meet the acceptance criteria?

Edited by Igor Frenkel

Merge request reports