Skip to content

Add BulkImports NdjsonExtractor & update labels pipeline to use it

George Koltsov requested to merge georgekoltsov/bulk_import_ndjson_labels into master

What does this MR do?

This MR adds NdjsonExtractor and updates labels pipeline to use it in Bulk Imports.

More information on Bulk Imports group migration tool https://docs.gitlab.com/ee/user/group/import/

Majority of Bulk Import ETL pipelines (extract -> transform -> load) use GraphQL API to import data. However, due to challenges described in #326757 (closed) not all group relations can be transferred over while preserving all their associations (e.g. if an epic has notes, which have award emojis. such nested relations are difficult to preserve using GraphQL API, due to nested pagination).

Instead, download exported relation ndjson.gz file from 'Group relations export API' that was recently added as part of #329864 (closed) https://docs.gitlab.com/ee/api/group_relations_export.html and import it. This way we can easily preserve all nested associations, as we're reusing alot of the behaviour from existing Import/Export codebase.

NdjsonExtrator does the following:

  1. Downloads labels.ndjson.gz from source GitLab instance using 'Group relations export API'
  2. Decompresses it
  3. Reads data from file and returns it for processing (one line at a time using ImportExport NdjsonReader)

LabelsPipeline is updated from GraphQL extractor to NdjsonExtrator in order to preserve epic-label association. Epics pipeline is going to be updated to use NdjsonExtractor in the future MR. This MR is a split from my draft MR (!61044 (closed)) as an attempt to have smaller MR easier to review.

Updated LabelsPipeline utilises existing Import/Export RelationFactory which brings a lot of benefits, like making sure all nested relations are transformed into objects, all attributes are sanitized, appropriate attributes are added, etc.

Mentions #329864 (closed)

Screenshots (strongly suggested)

ndjsonlabels

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Security

Does this MR contain changes to processing or storing of credentials or tokens, authorization and authentication methods or other items described in the security review guidelines? If not, then delete this Security section.

  • Label as security and @ mention @gitlab-com/gl-security/appsec
  • The MR includes necessary changes to maintain consistency between UI, API, email, or other methods
  • Security reports checked/validated by a reviewer from the AppSec team
Edited by George Koltsov

Merge request reports