Skip to content

Externally stored merge request diffs take a very long time to export

When user initiates Project Export it exports all merge requests as well as diff files in order to later import it elsewhere.

If merge request diffs are stored using object storage, each individual diff is downloaded from Object Storage which introduces latency and significantly increases the amount of time export takes.

A test project on staging (https://staging.gitlab.com/gk-import-tests/gitlabhq/) with ~4k MRs takes ~60 minutes to export vs 80 seconds on local file system.

385 diffs example

Staging

Benchmark.realtime do
  MergeRequestDiff.find(...).merge_request_diff_files.last(385).each do |f|
    f.diff
  end
end

=> 18.50129810348153 

Local file system

Benchmark.realtime do
  MergeRequestDiff.find(...).merge_request_diff_files.last(385).each do |f|
    f.diff
  end
end

=> 0.014228000305593014

A 0.4 Mb diff file which take to read on its own 0.03 seconds is being read 385 times with a different content range every time, ending up taking 385 * 0.03 (11 seconds) not counting serialization, writing json to file and other operations. Each diff retrieval take 0.03 seconds on average using external storage (based on my testing).

In this particular example ideally we would want to download full diff, since it's quite small and cache it, in order to access it and create diff files based off it. I imagine current way of diff retrieval was done this way in order to prevent download of large diff files.

We should investigate ways of speeding this process up, as current export times are extremely long. I tried implementing persistent http connection but it did not shave off much time.

Edited by George Koltsov