Skip to content

Clear SafeRequestStore often when exporting

What does this MR do and why?

This MR attempts to resolve memory consumption issues that have been reported in #420623 (closed)

It:

  • Updates StreamingSerializer to have exported records to be appended to file on disk in batches, instead of having them all exported into 1 enumerator and then written
  • Because we now write to file in batches, we can clear SafeRequestStore often, which lowers the overall memory footprint during export of big projects/groups

This is most evident when exporting projects with a lot of issues/mrs that have a lot of system notes in them. Because we recently added note export authorization checks (only export system notes that the exporting user has access to) it spiked up memory consumption of the export, resulting in sidekiq worker & node in running out of RAM when performing export. This is due to expensive authorization checks that each system note has to go through & us caching a lot of data in memory in SafeRequestStore, resulting in memory bloat. Clearing the store after each batch of 100 records keeps memory consumption to what it was before, as seen below:

Exporting a project with 5000 issues, each issue having 10 regular and 2 system notes:

> /usr/bin/time -l rails runner script.rb

# script.rb
::Gitlab::SafeRequestStore.ensure_request_store do
  ProjectExportWorker.new.perform(1, 20)
end

Clearing request store, 900Mb MAX RSS (peak memory usage of the process):

       87.72 real        72.30 user         7.29 sys
           939163648  maximum resident set size

Not-clearing request store, 1.8Gb MAX RSS:

       93.32 real        70.62 user        14.79 sys
          1894924288  maximum resident set size

This change brings down MAX RSS to the figures when we didn't peform auth checks:

       31.20 real        23.79 user         6.18 sys
           902938624  maximum resident set size

Clear SafeRequestStore often when exporting

  • Add SafeRequestStore.clear! when exporting projects/groups in order to keep memory usage at bay
  • Update StreamingSerializer to append records to file in batches instead of having all records in a single enumerator

Changelog: fixed

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

  1. Create a project with a lot of issues (e.g. 5k like I did in the example above)
  2. Create a system note in each of the issues (e.g. each issue mentioned on a commit)
  3. Perform export of the project and measure peak memory (e.g. like I did in the example above)
# script.rb

::Gitlab::SafeRequestStore.ensure_request_store do # we need to wrap in this block because that's how sidekiq jobs are executed (see lib/gitlab/sidekiq_middleware/request_store_middleware.rb:7)
  ProjectExportWorker.new.perform(1, 20)
end

/usr/bin/time -l rails runner script.rb

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by George Koltsov

Merge request reports