Skip to content

Improve performance and memory usage of project export

Stan Hu requested to merge sh-improve-memory-project-export into master

ActiveModel::Serialization is simple in that it recursively calls as_json on each object to serialize everything. However, for a model like a Project, this can generate a query for every single association, which can add up to tens of thousands of queries and lead to memory bloat.

To improve this, we can do several things:

  1. Use the option tree in http://api.rubyonrails.org/classes/ActiveModel/Serializers/JSON.html to generate the necessary preload clauses.

  2. We observe that a single project has many issues, merge requests, etc. Instead of serializing everything at once, which could lead to database timeouts and high memory usage, we take each top-level association and serialize the data in batches.

For example, we serialize the first 1,000 issues and preload all of their associated events, notes, etc. before moving onto the next batch. When we're done, we serialize merge requests in the same way. We repeat this pattern for the remaining associations specified in import_export.yml.

Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/35389

Edited by Stan Hu

Merge request reports