Group Migration - MVC (One Group)
## The Problem Migrating Groups and Projects is available in the GitLab UI and through the API. It involves exporting and importing individual data files for each group and project. There is currently no way to migrate members. In order to migrate an entire group with projects, the user has to coordinate multiple API calls and do a lot of manual data manipulation. This user experience creates a lot of friction for the data migration between two GitLab instances. Additionally, the current solution is not scalable, nor resilient and it provides little feedback on any failures. See the full discussion of the problem and the proposed solution in [this epic](https://gitlab.com/groups/gitlab-org/-/epics/2771#the-problem). ## Goal The goal of this issue is to make the first step toward the [GitLab Group Migration](https://gitlab.com/groups/gitlab-org/-/epics/2771) feature: > A one-stop-shop for migrating all or parts of your GitLab instance. Select a group or just a project and start a one-click migration. This solution would allow migration of group, project, and user data. The user would not have to download or upload files. ## Proposed MVC ![image](/uploads/6dd4d5bc6d6e0322278051b6af047a5e/image.png) Create a very simple (basic epic fields only) group migration tool in the UI that is based on the API-only approach. This solution would be initiated on the destination server, using the [current UX metaphor](https://gitlab.com/gitlab-org/gitlab/-/issues/237925) to select this new type of import. The following functionality would be in scope for the MVC: * Initiate Group migration * Provide credentials to the source instance of GitLab (URL and token) * Show list of available top-level groups * Allow selecting one top-level group to be migrated (no check is done to prevent re-import) * Create the group (and all the associated subgroups) in the destination instance * Migrate only the epics for that group tree (consisting only of the basic epic fields only - at the minimum epic titles) ## Out of scope for MVC These features were discussed for MVC, but are currently deemed out of scope (issues to be created): * Ability to authenticate with the other instance of GitLab - Not planned * Ability to select multiple groups for migration - https://gitlab.com/gitlab-org/gitlab/-/issues/267953 * Filter (search) for the group list - https://gitlab.com/gitlab-org/gitlab/-/issues/241662 * Status column in the group list - https://gitlab.com/gitlab-org/gitlab/-/issues/241657 ## Future iterations Future iterations would add more objects, more depth to each object, and more relationships. ## UX Designs MVC designs and prototype posted in the issue **UX for GitLab Group Migration - MVC** https://gitlab.com/gitlab-org/gitlab/-/issues/237925 ## :gear: Architecture Overview Architecture diagram of Group Migration via API (simplified to focus on main components). A subject to change since this is still an evolving feature that Import Group is iterating on. ![image](/uploads/b734e8a9ad6ab1c219477eae5f6c7201/image.png) ### :thinking: Design decisions 1. In order to support Importing of complex groups & projects compositions (for example, with a single click a user wants to import 20 groups with their subgroups and a number of projects in them) several new models are introduced, under `BulkImports` namespace: - `BulkImport` - overarching model to keep track of the overall import process, has many import entities - `BulkImports::Configuration` - credentials to access source GitLab instance - `BulkImports::Entity` - individual import entities that need to be processed (imported). Usually one of 2 types: project or group entity. Allows us to keep track of individual entity import process. Additionally, preserves parent-child association from source ![image](/uploads/02ef293844cfa92ae82e0348e7b5d426/image.png) Creation of bulk import objects is a first step in a migration. Sequence diagram (this process might be outdated, as feature is constantly being worked on): ![image](/uploads/5c143b27078b612b7a40e7101189f3e5/image.png) 2. Import process GitLab's own GraphQL (and REST) APIs to fetch and import data instead of dealing with archive files (similar to GitHub/Bitbucket/other importers) Main advantages of this are: - There is no longer need in Export step - since all data is exposed via API - We are GraphQL first company https://about.gitlab.com/handbook/engineering/#graphql-first so Import will be contributing towards expanding GraphQL API coverage, which benefits both GitLab and its users - A lot of GraphQL resources have already been implemented and are available for us to use - It comes with flexible pagination out of the box. No need to implement pagination ourselves 3. A new ETL (Extract, Transform, Load) Pipelines concept is introduced as a way to handle data, which splits Import process in 3 main parts with clear separation of responsibility: - Extractors: retrieve data from source GitLab instance. E.g. GraphQL/REST API call, file download - Transformers: small classes that change data and pass it down to the next transformer in order to 'prepare' data for loading (saving). E.g. update user references, update parent group, etc - Loaders: save data to destination. Most of the time it is persistence of data in the database but we can potentially pipe data to another ETL pipeline to do more modifications ![image](/uploads/efafa404bc84bc5d987c8d5b2bbb933f/image.png) 4. Current Project/Group Import/Export is done as part of a singular background process, no matter how big or small the export file is. Because of that, a big project import can take some time to complete/occupy background worker for a long time. With the new approach, we plan to introduce distributed import execution in order to allow big group/project bundles to be imported within a reasonable amount of time. Additionally, we want to explore an opportunity of distributed import within a singular project/group (e.g. project's merge requests are imported within one worker while project's issues are within the other). The approach is currently discussed in https://gitlab.com/gitlab-org/gitlab/-/issues/270098. Currently group import is performed within a single worker. ### Release notes A faster and easier way to migrate your GitLab Groups is on the way. Group Migration is a new feature that lets you copy a GitLab Group from one instance to another directly, without having to export and import any files. In this release, we migrate only the Group object with basic fields. We plan to follow up with more and more fields and related objects until all relevant data in a Group is migrated in this easy-to-use fashion. Documentation: https://docs.gitlab.com/ee/development/bulk_import.html
epic