WIP: Implement `ndjson` support for `import/export`
What does this MR do?
ndjson support for
ndjson and streaming
support to handle two cases:
.ndjsonformat, where each relation receives a separate file, and each item is stored per-line
This can properly detect old and a new file contents, without any changes to the files, and by maintaining backward compatibility.
This implements a trick to support streaming json writer to append data additively.
This overall when
exporting legacy/ndjson or
allows us to have a constant memory for the process,
and also significantly reduces latency of the data processing
due to not escaping to the native.
This does remove the usage of
RelationFactory on exporting side.
I believe it is OK trade-off to make.
Keep in mind that idle memory usage of GitLab is around ~500MB.
git checkout b213471f
1.1. Import on
IMPORT_DEBUG=1 bin/rake gitlab:import_export:import[root,root,gitlabhq-with-issues-4,tmp/exports/gitlabhq_with_issues_export_ndjson_v2.tar.gz]
Time to finish: 1260.872610532002 Number of SQL calls: 147407 Memory usage: 890.62109375 MiB GC calls: 2718 GC major calls: 55 Label: process_345
1.2. Export on
IMPORT_DEBUG=1 bin/rake gitlab:import_export:export[root,root,gitlabhq-with-issues-3,tmp/exports/gitlabhq_with_issues_export_legacy_v2.tar.gz]
Time to finish: 97.66875144900041 Number of SQL calls: 4006 Memory usage: 761.77734375 MiB GC calls: 199 GC major calls: 26 Label: process_309
git checkout dbcec49a
2.1. Import on
IMPORT_DEBUG=1 bin/rake gitlab:import_export:import[root,root,gitlabhq-with-issues-5,tmp/exports/gitlabhq_with_issues_export_ndjson_v2.tar.gz]
Time to finish: 1207.3776693220025 Number of SQL calls: 147418 Memory usage: 671.0703125 MiB GC calls: 2737 GC major calls: 42 Label: process_378
2.2. Export on
IMPORT_DEBUG=1 bin/rake gitlab:import_export:export[root,root,gitlabhq-with-issues-3,tmp/exports/gitlabhq_with_issues_export_ndjson_v2.tar.gz]
Time to finish: 102.0661853370002 Number of SQL calls: 4006 Memory usage: 564.35546875 MiB GC calls: 199 GC major calls: 23 Label: process_280
Does this MR meet the acceptance criteria?
- Changelog entry
- Documentation (if required)
- Code review guidelines
- Merge request performance guidelines
- Style guides
- Database guides
- Separation of EE specific content
Availability and Testing
- Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
- Tested in all supported browsers
If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:
Label as security and @ mention
- The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
- Security reports checked/validated by a reviewer from the AppSec team