Draft: Speed up encoding detector
What does this MR do and why?
This MR speeds up encoding detector by reducing the amount of data sent to the detector (charlock_holmes
gem).
The initial scan limit is set to 1MB: 1024*1024
.
1MB is an incredibly generous search window for encoding, in most situations 300 bytes is enough (in ASCII). In this MR we're reducing this limit to 8000 bytes for blob helper and to 19000 bytes for the default limit, which is enough for most situations.
This allows us to save a lot of performance on the serialization step (sending binary data to charlock_holmes
gem) when working with large text files that could reach close to a 1MB size.
The change significantly improves streaming performance in Rapid Diffs and on other pages that check if a particular file is binary.
Screenshots or screen recordings
Before | After |
---|---|
![]() |
![]() |
![]() |
![]() |
How to set up and validate locally
- Enable
rapid_diffs
andrapid_diffs_on_mr_show
feature flags - Go to any big merge request (more than 100 files/1000 lines changed)
- Select 'Changes' tab
- Add
?rapid_diffs=true
to the URL, follow it - Measure
diffs_stream
request timings - Observe better timings with the change applied
Edited by Stanislav Lashmanov