User contribution mapping (#12378) · Epics · GitLab.org

User contribution mapping

### Problem In the process of mapping users' contributions during direct transfer, users are required to set a public email on the source instance and create a corresponding user in the destination instance with the same email address. If a user from the source instance cannot be matched to a user in the destination due to various reasons, such as the absence of prior creation, email configuration issues, or user inactivity, the import process assigns the importing user as the author or assignee for objects like issues, epics, merge requests, comments, and more. The current preparation step for user mapping presents several challenges. It demands substantial coordination from the importing user to ensure the correct configuration of both source and destination users. Additionally, if a user mapping fails, it is impossible to fix it unless another migration is performed. ### Proposed Solution To enhance the user experience, we can use a different user mapping approach. Instead of immediately assigning contributions to real destination users, we can initially attribute these contributions to placeholder users. Subsequently, namespace owners can reassign these contributions to real destination user profiles at their convenience. To achieve this, the Direct Transfer importer will generate a placeholder user account for each source user with a contribution within the import process. These [placeholder users](https://docs.gitlab.com/ee/development/internal_users.html) will be linked to the top-level namespace. Several attributes will be assigned to the placeholder user to maintain the connection with the source user, including: * source_user_id: This unique identifier will be used by the import process to determine if a new placeholder user needs to be created * source_hostname: The source hostname or domain ~~* source_email: The source user's public email address to facilitate namespace owners during the reassignment of the contribution.~~ * source_name: The source user's name to facilitate namespace owners during the reassignment of the contribution * source_username: The source username to facilitate namespace owners during the reassignment of the contribution. * import_type: To distinguish which importer created the placeholder To preserve historical context, the placeholder's `name` and `username` will resemble the source `name` and `username`. For example, the placeholder's name can have the structure "Placeholder Source Name" and the placeholder's username structure can be `%{source_username}_placeholder_user_%{incremental_number}` To facilitate the reassignment of placeholder user's contributions to real destination users, namespace owners can use the UI to choose the real destination user to whom the placeholder user's contributions will be reassigned. They should also be able to "do it in bulk", for example by supplying a CSV. The reassignment process will only be finalized after the selected user accepts (or rejects) the request. Once the user accepts the request, all contributions previously attributed to the placeholder user will be attributed to their user. Within the same UI, namespace owners will have the ability to cancel the request as well. It's important to note that once the user accepts the request, the operation cannot be reversed or rolled back. ~~To prevent the namespace owner from selecting an incorrect user, only members of the namespace will be able to be selected.~~ (Note: there is an ongoing [discussion](https://gitlab.com/gitlab-org/gitlab/-/issues/466118) about this) Owners will be able to assign the contribution to any active human user of the Gitlab instance. Once this issue is fixed, we can remove this comment on the docs: https://gitlab.com/gitlab-org/gitlab/-/merge_requests/136246 ### User flow and designs See proposed user flow: https://gitlab.com/gitlab-org/gitlab/-/issues/451028/designs/Direct_transfer_-_Placeholder_users_mapping_flow.png and designs in https://gitlab.com/gitlab-org/gitlab/-/issues/451028. #### Requirements - There should be a visible clue that a particular contribution was imported from another place vs. authored directly in place. Also after the re-assignment was done (see [issue](https://gitlab.com/gitlab-org/gitlab/-/issues/424454)). - Even before the re-assignment its performed, usability is as good as it can be: looking at the contribution that is not yet re-asssigned to a real user on the destination instance, it is already possible to know that some contributions were done by one user, while other contribution were done by another user and, so far as security allows, it is possible to understand which user from source instance authored it. - It should be possible to map contributions of users who have different email addresses on source and destination instances. - No preparation on the destination instance regarding users is needed to successfully run the import. Before the re-assignment, users that should have contributions re-assigned to them, have to exist on the destination instance, be confirmed and belong to the namespace where the contributions were copied to. - It should be possible to re-assign contributions to an actual user as long as the contributions belong to a placeholder users. Also in case when some contributions didn't get re-assign to actual users (e.g. due to errors during re-assignment), it should be possible to run the assignment process (again). - Re-assigmnet could be done for some chosen users only, those that the importing user cares about; importing user might not care about contributions of the ex-employees for example. In that case, placeholder users with contributions are not deleted. - Re-assignment can only be done by Owners. - Re-assigning users can cancel the re-assignment request. - Owners can mark placeholder users as 'keep as placeholder', which means the placeholder user will never be assigned to a real user. This operation is irreversible. We may review this in the future. - The contributor needs to accept that imported contributions will be assign to them, before the re-assignment process starts. Once the user accepts the reassignment, any subsequent imported contributions should be automatically assigned to the users who previously accepted the reassignment, rather than being assigned to the placeholder. With this approach, owners wouldn't need to reassign users if they performed the reassignment and later performed another migration. Also, it will allow us to delete the placeholders after reassignment since they will no longer be used, as we will assign contributions directly to the user who previously accepted the reassignment. (see [flow](https://gitlab.com/groups/gitlab-org/-/epics/12378#note_1773460819)) - It's impossible to re-assign contributions to one user and then re-assign them to another user. - The placeholder users to which contributions would be mapped during the import, before the process of assigning contributions to actual users, need to be: - - unable to do anything at all - they cannot be logged into, they cannot run pipelines, they cannot be added as members to groups or projects, they should not appear in any suggestions (in issue/MR comments, assignees, reviewers) - - they usernames should be such, that the real users cannot choose placeholder users usernames and it's clear that placeholder users are placeholder users and not real users. - When re-assignment is completed for a placeholder user, when a placeholder user doesn't have any more contributions, that placeholder user has to be deleted. - Placeholder users will be limited per top-level namespace on destination instance. The limits will differ depending on tier. Limit for Free users will be low. Limits need to be be adjustable. It should be possible to adjust the limit for a chosen top-level namespace. - Migrating user will get notified before the actual migration starts that the limit of placeholder users for the namespace they are importing into would be reached, if they continue with the migration. They could still continue with the migration as is, but when the limit of placeholder users is reached, the contributions of users for whom placeholder users could not be crated, will be assigned to Importer User ~~Migration Bot~~ ([see idea](https://gitlab.com/gitlab-org/gitlab/-/issues/429299#note_1650755279)). This would allow to show an additional clue that a particular contribution was imported from another place vs. authored directly in place. This will not allow, before the re-assignment, to know that some contributions were done by one user, while other contribution were done by another user. This will be documented and part of the warning for user before they decide to migrate knowing that the limit of placeholder users will be reached. - Placeholder users would be created only when the migration is done to a group and not a personal namespace. For imports to personal namespaces we will use Importer User ~~Migration Bot~~ - that means that all contributions will be assigned to Importer User ~~Migration Bot~~ and not to actual users and not to importing user. This will be documented. (can we show warning when users migrate to personal namespace?) ### Technical requirements - **This solution should be easily extendable to other importers, new and already existing. It could become the first component of the Importer framework.** #### Limits for placeholder user - **This will be reworked**, based on https://gitlab.com/gitlab-data/product-analytics/-/issues/1743+ We limit the number of placeholder users available per top-level namespace. We give higher limits to namespaces on higher tiers factoring in number of seats of the root namespace being imported to (on destination instance): | Plan | Seats | Placeholder user limit top-level namespace | |------------------------|------------|-----------------------------------| | Free and Trial | any amount | 200 | | Premium | <100 | 500 | | Premium | <400 | 2000 | | Premium | 400+ | 5000 | | Ultimate + Open source | <100 | 1000 | | Ultimate + Open source | <400 | 4000 | | Ultimate + Open source | 400+ | 10000 | If necessary, the users could request the limit to be increased, by contacting support. Free users cannot contact support - we can open a feedback issue for them to comment on and monitor the number of errors ~"the limit of placeholder users has been reached"). The assumption is that customers set up their paid namespace before the import. Placeholder users will not count towards licence limits. Because the limitation of the placeholder users is important for .com, we will default the limit for an instance to "unlimited", and set the limits for .com . The limit for each top-level group on .com should be adjustable. :question: We're still [discussing](https://gitlab.com/groups/gitlab-org/-/epics/12378#note_1761163288) if the placeholder limit should be applied for a certain amount of time. We create one placeholder user per source and per top-level group: - If I import the same project twice to the same top-level group, the second import uses the same placeholder user as the first import. - Importing the same project to different top-level namespaces wouldn't re-use the placeholder users and instead create them twice ([reasoning](https://gitlab.com/gitlab-org/gitlab/-/issues/429299#note_1652832429)). #### Additional points regarding re-assignment - The re-assigning user (Owner) should be able to request re-assignment at the later time, after the import is completed. Reassignment will be possible only on level of the top-level group ([reasoning](https://gitlab.com/groups/gitlab-org/-/epics/12378#note_1773460819)) - ~~For the re-assignment we will use `usernames`, not emails.~~ - For the re-assignment, we will use a component that allows the group owner to search the user by username or email. - We will build an endpoint to be able to throw an error/warning early, before the start of the actual migration, that the limit of placeholder users has been reached. This will work for direct transfer only. For GH and BB importers, we would need to wait for the API response, which would take some hours. We can be ok with that, if we assume that it would be part of preparation, of pre-migration checks. From this warning, we will link to docs and/or explain directly in UI or API response, what to do in order not to hit the limit (request raising the limit by GL, break down import to smaller groups, as the limits are per group per import). - Re-assignment of contributions assigned to Importer User ~~Migration Bot~~ is possible: To enable the re-assignment, a new database table should be created to store the association between contributions and their original source user. This will allow the reassignment to work pretty much the same for both strategies, which should allow most of the components/code to be re-used. ~~- When some contributions are not re-assigned correctly, it should be indicated to the re-assigning user and they should be able to click a `run again` button. This process can run without additional agreement from contributing user.~~ - there might not be "Run again" button.  *This page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features, or functionality remain at the sole discretion of GitLab Inc.*

epic