Import subgroups when using Group Import via GraphQL (!46248) · Merge requests · GitLab.org / GitLab

George Koltsov requested to merge georgekoltsov/group-import-sub-groups into master Oct 27, 2020

⚠ entry point is behind bulk_import feature flag (see Import::BulkImportsController) which is currently turned off.

What does this MR do?

This MR adds support to migrate all subgroups of specified group when doing Group Import using GraphQL API.

When user initiated a Group Import, they specify TOP LEVEL group only, it's source group full path, source name, destination name (to have an ability to import the group under a new name) and destination namespace (group/subgroup on destination GitLab instance). From this information a new BulkImports::Entity of source_type: 'group_entity' is created that is later on used to perform the actual group import.

A few important points:

All the descendants of top level group are fetched at a later stage, inside BulkImportWorker and additional BulkImports::Entity objects are created (with appropriate parent_ids to restore the ancestry tree from source). Once all entities are created, Group Import for each entity is performed. We create all of the entity objects in advance, before any import is started, which removes the need to use recursive approach, provides a better visibility into how many groups we would have to import (and can react accordingly, e.g. if subgroup structure is too big we can stop the import, if the limit is exceeded). Additionally, we can run validations on group name/namespace full path early and give feedback back to the user straight away.
GraphQL Group type does not have subgroups/descendants information available, that is why they are being fetched via Http client. We can later on explore an opportunity of adding this information to Graphql.
Subgroup entities creation consists of 4 main steps:

Fetching all group's descendants at once (100 per page, iterate over each page), instead of per subgroup. This should help reduce the amount of network requests to source GitLab instance from number of subgroups to number of pages
All entities are getting created with blank parent id and placeholder destination namespace. Both of these things are updated as separate steps later on for reasons listed below.
Based on fetched descendant groups and their source parent_ids ancestry tree is recreated once all objects are persisted. It was easier to restore ancestry tree this way (once you have objects persisted with ids assigned to them), instead of trying to recreate object hierarchy manually based on descendants hash from source and persisting objects in specific order (higher level entities have to be created and present before child entities, since we need to populate parent_id). Doing it this way the order of persistence does not matter.
Destination namespace values for each entity are all set to placeholder - value before they are updated. Updating destination namespace later on at this step is done because of an ability to set custom top level group name and destination namespace. Because of that, in order to set proper destination namespace, we need fully restored subgroup object hierarchy present. A typical subgroup destination namespace value would consist of <top level entity destination namespace>/<top level destination path>/<subgroup 1 destination path>/.../<subgroup N destination path>. E.g. existingnamespace/top_level_group_under_new_name/subgroup1/subgroup2/.../subgroupN

Once all subgroup entities are created, they are then are being passed in one by one to BulkImports::Importers::GroupImporter in a specific order, since order of restoration matters. A subgroup cannot be restored before parent group is created. The order is the following:

Any top level entity with parent_id: nil
Any entity whose parent has status: finished

This ensures groups are imported in the right order.

To test it:

Seed group structure using rake task bundle exec rake "gitlab:seed:group_seed[4, root]" -- this step can take some time to finish
Create a destination group you want source groups to be imported into (can be top level or a subgroup)
Make sure sidekiq is running. Open rails console

user = User.first
credentials = { url: 'http://127.0.0.1:3000', access_token: 'token' }
params = [{ source_type: 'group_entity', source_name: '3z5n7', source_full_path: '3z5n7', destination_name: 'my imported group', destination_namespace: 'my-group/my-imported-group/sub' }]

BulkImportService.new(user, params, credentials).execute

Verify the entire group structure from source is carried over to destination group under a new name

Screenshots (strongly suggested)

Source:

Destination:

Out of scope of this MR

Distributed import execution (discussed and will be done as part of #270098 (closed))
Error handling

Conformity

Mentions #270074 (closed)

Edited Oct 29, 2020 by George Koltsov

Import subgroups when using Group Import via GraphQL

What does this MR do?

Screenshots (strongly suggested)

Out of scope of this MR

Conformity

Merge request reports