Skip to content

Adjust Bitbucket Cloud PR importer to be resumable

  • Please check this box if this contribution uses AI-generated content (including content generated by GitLab Duo features) as outlined in the GitLab DCO & CLA

What does this MR do and why?

Adjust Bitbucket Cloud importer to be resumable. Upon interruption of "stage workers", ideally, we should only resume from the last page. Usually, we record the last "page number" then start from that page. However as per Bitbucket Cloud documentation

However, clients are not expected to construct URLs themselves by manipulating the page number query parameter. Instead, the response contains a link to the next page. This link should be treated as an opaque location that is not to be constructed by clients or even assumed to be predictable. The only contract around the next link is that it will return the next chunk of results.

It is important to realize that Bitbucket support both list-based pagination and iterator-based pagination. List-based pagination assumes that the collection is a discrete, immutable, consistently ordered, finite array of objects with a fixed size. Clients navigate a list-based collection by requesting offset-based chunks. In Bitbucket Cloud, list-based responses include the optional size, page, and previous element. The the next and previous links typically resemble something like /foo/bar?page=4.

Our pagination supposed to depend on "next URL" instead of the traditional "page number"

Changelog: performance

Technical Decisions

  • This MR only handles Bitbucket PR importer, other importers will be worked on different MR
  • ParallelScheduling does not define def execute yet as it might cause too many changes. Notice that many "importer workers" included ParallelScheduling where it shouldn't needed.
  • ParallelScheduling behaves slightly differently from GitHub. It defines a string for representation_type instead of class. The reason is to minimize changes, as this MR reuses the current Bitbucket::Page which "automatically" converts the items into representation object

MR acceptance checklist

Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.

Screenshots or screen recordings

Screenshots are required for UI changes, and strongly recommended for all other merge requests.

Before After
Successful imported MR:
Screenshot_2024-06-25_at_23.21.08
Interruption happened after MR title dummy 20:
Screenshot_2024-06-25_at_23.13.28
Upon resume, retrying MR title dummy 20. Then continuing to MR title dummy 21:
Screenshot_2024-06-25_at_23.15.13

How to set up and validate locally

  1. Setup Bitbucket Cloud following this guide
  2. In rails console enable the feature flag
    Feature.enable(:bitbucket_import_resumable_worker)
  3. Patch the code to add interruption:
diff --git a/lib/gitlab/bitbucket_import/parallel_scheduling.rb b/lib/gitlab/bitbucket_import/parallel_scheduling.rb
index 8f03bf1db1cd..927a7c62e034 100644
--- a/lib/gitlab/bitbucket_import/parallel_scheduling.rb
+++ b/lib/gitlab/bitbucket_import/parallel_scheduling.rb
@@ -42,6 +42,14 @@ def each_object_to_import
         options = collection_options.merge(representation_type: representation_type, next_url: page_keyset.current)
 
         client.each_page(collection_method, repo, options) do |page|
+          log_info(message: page.inspect)
+          Gitlab::Redis::SharedState.with do |redis|
+            temp_key = 'test-bitbucket-pr'
+            temp_counter = redis.incr(temp_key)
+            redis.expire(temp_key, 5.minutes)
+            raise "purposely interrupt" if temp_counter == 2
+          end
+
           page.items.each do |object|
             job_waiter.jobs_remaining = Gitlab::Cache::Import::Caching.increment(job_waiter_remaining_cache_key)
  1. Visit Bitbucket Cloud import page http://127.0.0.1:3000/import/bitbucket/status
  2. Click the Import button.
  3. Tail the log file: tail -f log/importer.log

Related to #466231 (closed)

Edited by Rodrigo Tomonari

Merge request reports