Import subgroups when using Group Import via GraphQL
bulk_import
feature flag (see Import::BulkImportsController
) which is currently turned off.
What does this MR do?
This MR adds support to migrate all subgroups of specified group when doing Group Import using GraphQL API.
When user initiated a Group Import, they specify TOP LEVEL group only, it's source group full path, source name, destination name (to have an ability to import the group under a new name) and destination namespace (group/subgroup on destination GitLab instance). From this information a new BulkImports::Entity
of source_type: 'group_entity'
is created that is later on used to perform the actual group import.
A few important points:
-
All the descendants of top level group are fetched at a later stage, inside
BulkImportWorker
and additionalBulkImports::Entity
objects are created (with appropriateparent_id
s to restore the ancestry tree from source). Once all entities are created, Group Import for each entity is performed. We create all of the entity objects in advance, before any import is started, which removes the need to use recursive approach, provides a better visibility into how many groups we would have to import (and can react accordingly, e.g. if subgroup structure is too big we can stop the import, if the limit is exceeded). Additionally, we can run validations on group name/namespace full path early and give feedback back to the user straight away. -
GraphQL
Group
type does not have subgroups/descendants information available, that is why they are being fetched via Http client. We can later on explore an opportunity of adding this information to Graphql. -
Subgroup entities creation consists of 4 main steps:
- Fetching all group's descendants at once (100 per page, iterate over each page), instead of per subgroup. This should help reduce the amount of network requests to source GitLab instance from number of subgroups to number of pages
- All entities are getting created with blank parent id and placeholder destination namespace. Both of these things are updated as separate steps later on for reasons listed below.
- Based on fetched descendant groups and their source
parent_id
s ancestry tree is recreated once all objects are persisted. It was easier to restore ancestry tree this way (once you have objects persisted withid
s assigned to them), instead of trying to recreate object hierarchy manually based on descendants hash from source and persisting objects in specific order (higher level entities have to be created and present before child entities, since we need to populate parent_id). Doing it this way the order of persistence does not matter. - Destination namespace values for each entity are all set to placeholder
-
value before they are updated. Updating destination namespace later on at this step is done because of an ability to set custom top level group name and destination namespace. Because of that, in order to set proper destination namespace, we need fully restored subgroup object hierarchy present. A typical subgroup destination namespace value would consist of<top level entity destination namespace>/<top level destination path>/<subgroup 1 destination path>/.../<subgroup N destination path>
. E.g.existingnamespace/top_level_group_under_new_name/subgroup1/subgroup2/.../subgroupN
Once all subgroup entities are created, they are then are being passed in one by one to BulkImports::Importers::GroupImporter
in a specific order, since order of restoration matters. A subgroup cannot be restored before parent group is created. The order is the following:
- Any top level entity with
parent_id: nil
- Any entity whose
parent
hasstatus: finished
This ensures groups are imported in the right order.
To test it:
- Seed group structure using rake task
bundle exec rake "gitlab:seed:group_seed[4, root]"
-- this step can take some time to finish - Create a destination group you want source groups to be imported into (can be top level or a subgroup)
- Make sure sidekiq is running. Open
rails console
user = User.first
credentials = { url: 'http://127.0.0.1:3000', access_token: 'token' }
params = [{ source_type: 'group_entity', source_name: '3z5n7', source_full_path: '3z5n7', destination_name: 'my imported group', destination_namespace: 'my-group/my-imported-group/sub' }]
BulkImportService.new(user, params, credentials).execute
- Verify the entire group structure from source is carried over to destination group under a new name
Screenshots (strongly suggested)
Source:
Destination:
Out of scope of this MR
- Distributed import execution (discussed and will be done as part of #270098 (closed))
- Error handling
Conformity
-
Changelog entry -
Documentation (if required) -
Code review guidelines -
Merge request performance guidelines -
Style guides -
Database guides -
Separation of EE specific content
Mentions #270074 (closed)