Skip to content

Draft: Use ETAG in UserFinder in GitHub Import

What does this MR do and why?

Update UserFinder class to use ETAG to reduce the number of requests to GitHub API

This change updates the class to store the ETAG response header when the user's public email is not configured to avoid reaching the API rate limit more often. When the user does not have a public email, we fetch the user detail every 15 minutes instead every 24 hours. The use of ETAGs is recommended by GitHub since it does not increase the rate limit count when the resource has not been modified.

Related to: #416308 (closed)

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After

How to set up and validate locally

How to set up and validate locally

One way to check if the cache works is to analyze all requests made to GitHub during an import. To do that, log the requests to GitHub by adding a log in the Octokig Midleware, like the example below

diff --git a/lib/gitlab/octokit/middleware.rb b/lib/gitlab/octokit/middleware.rb
index f944f9827a32..e59b27ba42a8 100644
--- a/lib/gitlab/octokit/middleware.rb
+++ b/lib/gitlab/octokit/middleware.rb
@@ -8,6 +8,8 @@ def initialize(app)
       end

       def call(env)
+        Gitlab::Import::Logger.info(message: 'GitHub API request', url: env[:url])
+
         Gitlab::UrlBlocker.validate!(env[:url],
           schemes: %w[http https],
           allow_localhost: allow_local_requests?,

Then import large GitHub project, for example using the command below that imports rspec-core

curl --location 'http://gdk.test:3000/api/v4/import/github' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer GDK_ACCESS_TOKEN' \
--data '{
    "personal_access_token": "GITHUB_ACCESS_TOKEN",
    "repo_id": "238983",     
    "target_namespace": "root",
    "new_name": "rspec-core"
}'

Then check the log for duplicated requests. For example, using the command below

grep "https://api.github.com/users/" log/importer.log | jq .url | sort | uniq -c | sort -h

Note: a few requests may be duplicated, as multiple workers can request the user details before the cache is saved, however, we shouldn't see a lot of duplicated requests

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Rodrigo Tomonari

Merge request reports

Loading