Strip control characters from scanner output
What does this MR do?
This MR strips the common control characters \t
\r
\n
from scanner output.
String.tr
is chosen over String.sub
due to performance considerations. During some rough testing a speed difference of 50x was commonly observed:
require 'json'
require 'benchmark'
obj = {'foo' => []}
(0..100000).each do |i|
obj['foo'] << "bar-#{i}"
end
json = obj.to_json.gsub('bar-', "bar-\t\r\n")
json_gsub = nil
json_dump = nil
Benchmark.bm do |x|
x.report { json_gsub = json.gsub(/\t|\r|\n/,"") }
x.report { json_dump = json.tr("\t\r\n",'') }
end
# ensure both strings are valid json
JSON.parse(json_gsub)
JSON.parse(json_dump)
Resulting output:
user system total real
0.069201 0.002277 0.071478 ( 0.073168)
0.001240 0.000174 0.001414 ( 0.001413)
What are the relevant issue numbers?
gitlab-org/gitlab#434468 (closed)
Does this MR meet the acceptance criteria?
-
Changelog trailer added -
Documentation created/updated for GitLab EE, if necessary -
Documentation created/updated for this project, if necessary -
Documentation reviewed by technical writer or follow-up review issue created -
Tests added for this feature/bug -
Job definition updated, if necessary -
Conforms to the code review guidelines -
Conforms to the Go guidelines -
Security reports checked/validated by reviewer
Edited by Igor Frenkel