Skip to content

Draft: Speed up encoding detector

What does this MR do and why?

This MR speeds up encoding detector by reducing the amount of data sent to the detector (charlock_holmes gem).

The initial scan limit is set to 1MB: 1024*1024.

1MB is an incredibly generous search window for encoding, in most situations 300 bytes is enough (in ASCII). In this MR we're reducing this limit to 8000 bytes for blob helper and to 19000 bytes for the default limit, which is enough for most situations.

This allows us to save a lot of performance on the serialization step (sending binary data to charlock_holmes gem) when working with large text files that could reach close to a 1MB size.

The change significantly improves streaming performance in Rapid Diffs and on other pages that check if a particular file is binary.

Screenshots or screen recordings

Before After
image image
image image

How to set up and validate locally

  1. Enable rapid_diffs and rapid_diffs_on_mr_show feature flags
  2. Go to any big merge request (more than 100 files/1000 lines changed)
  3. Select 'Changes' tab
  4. Add ?rapid_diffs=true to the URL, follow it
  5. Measure diffs_stream request timings
  6. Observe better timings with the change applied
Edited by Stanislav Lashmanov

Merge request reports

Loading