Skip to content

User user_contributions relation

Everyone can contribute. Help move this issue forward while earning points, leveling up and collecting rewards.

Context

In issue User mapping - Create relation contributors in ... (#454522 - closed), we introduced the user_contributions relation. This new relation was created to allow Direct Transfer to generate placeholder users for non-members using their name and username. This was because Direct Transfer couldn't access the user information within the exported relation files.

Without this relation, the alternative would have been to make multiple API requests to the source instance to retrieve user details during records migration. However, this approach could lead to potential issues, such as migration failures due to network errors.

It turns out that during the implementation of user contribution mapping in Direct Transfer, I ended up not using the user_contributions relation for a few reasons, which I'll outline below. Instead, I opted for a different approach that retrieves user details in batches of 100 and updates the placeholder name and username in a parallel process, which doesn't affect the main migration in case of network errors and also avoids overloading the source instance with numerous concurrent API requests.

One downside of my approach is that placeholder users for non-members are initially created with a generic name and username (i.e., Placeholder gitlab_migration source user 1 and gitlab_migration_placeholder_user_1 , and later, they are updated by the job that is executed in parallel.

Reasons for not using the user_contributions:

  1. The new relation wouldn't be available for older GitLab instances, so an alternative solution like the one I implemented would have to be implemented anyway. So, the method I implemented works for old and new GitLab versions, which reduces maintainability.
  2. The user_contributions relation was released behind the importer_user_mapping feature flag, which is still disabled. Therefore, it will not be available until we enable the importer_user_mapping globally.
  3. user_contributions relation is only exported after all other relations are exported. This limitation means that placeholder users with generic names still need to be created initially, with the user details updated later when the user_contributions data is finally exported. Therefore, the downside mentioned above would still exist if the user_contributions relation was used.

More considerations

Using the approach I implemented, Direct Transfer doesn't need the user_contributions relation. If we decide to use it, the only advantage would be to save a few network requests, with the cost of maintaining two different approaches when getting user details.

The user_contributions relations will still be necessary when we decide to implement the air-gapped migrations, as we won't be able to make HTTP requests to the source instance. So, perhaps we should roll out the export of the new relation even if it isn't used?

Question?

What should we do with the user_contributions relations?

Edited by 🤖 GitLab Bot 🤖