Add page counter to resume bitbucket server PR importer
What does this MR do and why?
This MR adds a page counter Gitlab::Import::PageCounter
to keep track of the last page processed (by a REST API call). When an error happens in the middle of processing PullRequestsImporter
, then it can resume from the latest page.
This MR also persists job_waiter.jobs_remaining
in Redis so that the value is not reset to 0.
MR acceptance checklist
Please evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
Screenshots or screen recordings
Screenshots are required for UI changes, and strongly recommended for all other merge requests.
Before | After |
---|---|
When interrupted, the job was retried and started from page 1 | When interrupted, the job was retried and started from the last page (page 4) -- my batch size was configured as 2 |
How to set up and validate locally
Numbered steps to set up and validate the change are strongly suggested.
- Initial environment setup can follow this
- Prepare some data in Bitbucket Server
- Patch the code to inject interruption:
diff --git a/gems/gitlab-http/lib/gitlab/http_v2/url_blocker.rb b/gems/gitlab-http/lib/gitlab/http_v2/url_blocker.rb
index 99876c779539..06bd0c6b92cd 100644
--- a/gems/gitlab-http/lib/gitlab/http_v2/url_blocker.rb
+++ b/gems/gitlab-http/lib/gitlab/http_v2/url_blocker.rb
@@ -64,6 +64,9 @@ def validate_url_with_proxy!(
)
# rubocop:enable Metrics/ParameterLists
+ allow_localhost = true
+ allow_local_network = true
+
return Result.new(nil, nil, true) if url.nil?
raise ArgumentError, 'The schemes is a required argument' if schemes.blank?
diff --git a/lib/gitlab/bitbucket_server_import/importers/pull_requests_importer.rb b/lib/gitlab/bitbucket_server_import/importers/pull_requests_importer.rb
index 14c38a326c99..965a848ecc5d 100644
--- a/lib/gitlab/bitbucket_server_import/importers/pull_requests_importer.rb
+++ b/lib/gitlab/bitbucket_server_import/importers/pull_requests_importer.rb
@@ -12,9 +12,16 @@ def execute
loop do
log_info(
import_stage: 'import_pull_requests',
- message: "importing page #{page} using batch-size #{concurrent_import_jobs_limit}"
+ message: "importing page #{page} using batch-size #{concurrent_import_jobs_limit}, remaining: #{job_waiter.jobs_remaining}"
)
+ Gitlab::Redis::SharedState.with do |redis|
+ temp_key = 'test-bitbucket-pr'
+ temp_counter = redis.incr(temp_key)
+ redis.expire(temp_key, 5.minutes)
+ raise "purposely interrupt" if temp_counter == 4
+ end
+
pull_requests = client.pull_requests(
project_key, repository_slug, page_offset: page, limit: concurrent_import_jobs_limit
).to_a
@@ -112,7 +119,7 @@ def target_branch_commit(target_branch_sha)
# application settings.
def concurrent_import_jobs_limit
# Reduce fetch limit (from 100) to avoid Gitlab::Git::ResourceExhaustedError
- 50
+ 2
end
end
end
- Tail the log file:
tail -f log/importer.log
- Go to
http://127.0.0.1:3000/import/bitbucket_server/status
- Click Import
Notes
- These small changes surprisingly took a lot of time to set up, understanding the flow (Bitbucket Server & GitHub), and test
😆 . It's fun though - I notice that this code also calls multiple API, but probably that part is related with this issue
- I'm not sure why
github_importer
was not callingpage_counter.expire!
, but I let it callexpire!
in this MR
Related to #450831
Edited by Ivan Sebastian