Skip to content

GitLab Next

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in / Register
GitLab
GitLab
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 35,760
    • Issues 35,760
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
    • Iterations
  • Merge Requests 1,289
    • Merge Requests 1,289
  • Requirements
    • Requirements
    • List
  • CI / CD
    • CI / CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Operations
    • Operations
    • Metrics
    • Incidents
    • Environments
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • CI / CD
    • Code Review
    • Insights
    • Issue
    • Repository
    • Value Stream
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • GitLab.org
  • GitLabGitLab
  • Merge Requests
  • !23920

Closed
Opened Jan 28, 2020 by Kamil Trzciński@ayufan🔴Maintainer0 of 12 tasks completed0/12 tasks
  • Report abuse
Report abuse

WIP: Implement `ndjson` support for `import/export`

  • Overview 42
  • Commits 2
  • Pipelines 8
  • Changes 14

What does this MR do?

Implement ndjson support for import/export

This implements ndjson and streaming json support to handle two cases:

  • big project.json (legacy way)
  • new .ndjson format, where each relation receives a separate file, and each item is stored per-line

This can properly detect old and a new file contents, without any changes to the files, and by maintaining backward compatibility.

This implements a trick to support streaming json writer to append data additively.

This overall when exporting legacy/ndjson or importing ndjson allows us to have a constant memory for the process, and also significantly reduces latency of the data processing due to not escaping to the native.

This does remove the usage of RelationFactory on exporting side. I believe it is OK trade-off to make.

Performance

Keep in mind that idle memory usage of GitLab is around ~500MB.

The master branch

git checkout b213471f

1.1. Import on master

IMPORT_DEBUG=1 bin/rake gitlab:import_export:import[root,root,gitlabhq-with-issues-4,tmp/exports/gitlabhq_with_issues_export_ndjson_v2.tar.gz]
Time to finish: 1260.872610532002
Number of SQL calls: 147407
Memory usage: 890.62109375 MiB
GC calls: 2718
GC major calls: 55
Label: process_345

1.2. Export on master

IMPORT_DEBUG=1 bin/rake gitlab:import_export:export[root,root,gitlabhq-with-issues-3,tmp/exports/gitlabhq_with_issues_export_legacy_v2.tar.gz]
Time to finish: 97.66875144900041
Number of SQL calls: 4006
Memory usage: 761.77734375 MiB
GC calls: 199
GC major calls: 26
Label: process_309

pid="process_110"

2. The implement-ndjson branch

git checkout dbcec49a

2.1. Import on implement-ndjson

IMPORT_DEBUG=1 bin/rake gitlab:import_export:import[root,root,gitlabhq-with-issues-5,tmp/exports/gitlabhq_with_issues_export_ndjson_v2.tar.gz]
Time to finish: 1207.3776693220025
Number of SQL calls: 147418
Memory usage: 671.0703125 MiB
GC calls: 2737
GC major calls: 42
Label: process_378

2.2. Export on implement-ndjson

IMPORT_DEBUG=1 bin/rake gitlab:import_export:export[root,root,gitlabhq-with-issues-3,tmp/exports/gitlabhq_with_issues_export_ndjson_v2.tar.gz]
Time to finish: 102.0661853370002
Number of SQL calls: 4006
Memory usage: 564.35546875 MiB
GC calls: 199
GC major calls: 23
Label: process_280

Does this MR meet the acceptance criteria?

Conformity

  • Changelog entry
  • Documentation (if required)
  • Code review guidelines
  • Merge request performance guidelines
  • Style guides
  • Database guides
  • Separation of EE specific content

Availability and Testing

  • Review and add/update tests for this feature/bug. Consider all test levels. See the Test Planning Process.
  • Tested in all supported browsers

Security

If this MR contains changes to processing or storing of credentials or tokens, authorization and authentication methods and other items described in the security review guidelines:

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited Feb 05, 2020 by Kamil Trzciński
Assignee
Assign to
Reviewer
Request review from
None
Milestone
None
Assign milestone
Time tracking
Reference: gitlab-org/gitlab!23920
Source branch: implement-ndjson