Prevent simultaneous requests on UserFinder - GitHub Import
As commented on https://gitlab.com/gitlab-org/manage/import-and-integrate/discussions/-/issues/61#note_1693672589, UserFinder is making duplicated requests to GitHub users' API when performing the user mapping as sometimes multiple Importer Worker tries to perform the user mapping simultaneously for the same user before the user detail being added to the UserFinder cache.
Also, because several Import Workers are executed simultaneously, and they may make an API request to GitHub, there is a possibility for GitHub to apply a secondary rate limit.
Context
For context, in general, most of the requests to GitHub's API are performed by the Stage workers as they are responsible for going through GitHub's API and fetching the data. Import workers request an API to fetch the user's public email if the user details can't be found in the cache.
Problem
- GitHub Import is making unnecessary requests, which causes the rate limit to be reached faster
- GitHub may apply a secondary limit due to the simultaneous requests.
Proposed solution
Solution 1 - Prevent UserFinder from making simultaneous requests - CHOSEN OPTION
In this solution, we would allow UserFinder to make only one request at a time across all migration workers. This means we would never simultaneously call GitHub's users' endpoint for a migration.
This solution should solve the problem #1
and #2
.
This solution may cause migrations to take longer, but I don't think that will be the case because the user's details are cached.
This solution caused problems on .com.
Solution 2 - Prevent UserFinder from fetching the information for the same user
This option would only prevent multiple simultaneous requests to the same GitHub user.
This solution would solve the problem #1
but not problem #2
This solution won't have the performance impact that Solution 1 may have. However, the migration could take much longer if GitHub applies the secondary limit.
Note for both solutions: We should allow UserFinder to be used simultaneously across different migrations