Skip to content

Add page counter to resume bitbucket server PR importer

What does this MR do and why?

This MR adds a page counter Gitlab::Import::PageCounter to keep track of the last page processed (by a REST API call). When an error happens in the middle of processing PullRequestsImporter, then it can resume from the latest page.

This MR also persists job_waiter.jobs_remaining in Redis so that the value is not reset to 0.

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After
When interrupted, the job was retried and started from page 1 When interrupted, the job was retried and started from the last page (page 4) -- my batch size was configured as 2
Screenshot_2024-04-10_at_12.17.16 Screenshot_2024-04-10_at_12.20.59

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

  1. Initial environment setup can follow this
  2. Prepare some data in Bitbucket Server
  3. Patch the code to inject interruption:
diff --git a/gems/gitlab-http/lib/gitlab/http_v2/url_blocker.rb b/gems/gitlab-http/lib/gitlab/http_v2/url_blocker.rb
index 99876c779539..06bd0c6b92cd 100644
--- a/gems/gitlab-http/lib/gitlab/http_v2/url_blocker.rb
+++ b/gems/gitlab-http/lib/gitlab/http_v2/url_blocker.rb
@@ -64,6 +64,9 @@ def validate_url_with_proxy!(
         )
           # rubocop:enable Metrics/ParameterLists
 
+          allow_localhost = true
+          allow_local_network = true
+
           return Result.new(nil, nil, true) if url.nil?
 
           raise ArgumentError, 'The schemes is a required argument' if schemes.blank?
diff --git a/lib/gitlab/bitbucket_server_import/importers/pull_requests_importer.rb b/lib/gitlab/bitbucket_server_import/importers/pull_requests_importer.rb
index 14c38a326c99..965a848ecc5d 100644
--- a/lib/gitlab/bitbucket_server_import/importers/pull_requests_importer.rb
+++ b/lib/gitlab/bitbucket_server_import/importers/pull_requests_importer.rb
@@ -12,9 +12,16 @@ def execute
           loop do
             log_info(
               import_stage: 'import_pull_requests',
-              message: "importing page #{page} using batch-size #{concurrent_import_jobs_limit}"
+              message: "importing page #{page} using batch-size #{concurrent_import_jobs_limit}, remaining: #{job_waiter.jobs_remaining}"
             )
 
+            Gitlab::Redis::SharedState.with do |redis|
+              temp_key = 'test-bitbucket-pr'
+              temp_counter = redis.incr(temp_key)
+              redis.expire(temp_key, 5.minutes)
+              raise "purposely interrupt" if temp_counter == 4
+            end
+
             pull_requests = client.pull_requests(
               project_key, repository_slug, page_offset: page, limit: concurrent_import_jobs_limit
             ).to_a
@@ -112,7 +119,7 @@ def target_branch_commit(target_branch_sha)
         # application settings.
         def concurrent_import_jobs_limit
           # Reduce fetch limit (from 100) to avoid Gitlab::Git::ResourceExhaustedError
-          50
+          2
         end
       end
     end
  1. Tail the log file: tail -f log/importer.log
  2. Go to http://127.0.0.1:3000/import/bitbucket_server/status
  3. Click Import

Notes

  • These small changes surprisingly took a lot of time to set up, understanding the flow (Bitbucket Server & GitHub), and test 😆 . It's fun though
  • I notice that this code also calls multiple API, but probably that part is related with this issue
  • I'm not sure why github_importer was not calling page_counter.expire!, but I let it call expire! in this MR

Related to #450831

Edited by Ivan Sebastian

Merge request reports