Skip to content

Increase interruption retries for GitHub workers

What does this MR do and why?

This MR increases the max_retries_after_interruption for some GitHub "stage" workers.

The setting is used by the sidekiq-reliable-fetcher gem to limit the number of times a job will be terminated and requeued. By default, this is 3.

When the Sidekiq process is asked to stop it will wait for a period of time for a running job to finish, before forcefully terminating and requeuing the job if it does not finish within this time. Normally Sidekiq will do this over and over for a job indefinitely, but sidekiq-reliable-fetcher limits this. This offers protection from jobs that can never be completed in time between Sidekiq restarts (for example, the time between deploys).

Many of our GitHub importer "stage" workers can work for a very long time in large GitHub imports, but can also pick up from where they left off when restarted. We see these workers sometimes being terminated more than 3 times, which prevents large GitHub imports from finishing. Because they resume their work exactly from where they left off, we can raise the max_retries_after_interruption limit to allow them more time to complete when processing very large imports.

In this MR, only those "stage" workers which resume their work exactly from where they left off are given the increased limit. These are (all workers are in Gitlab::GithubImport::Stage namespace):

  • ImportIssueEventsWorker
  • ImportLfsObjectsWorker
  • ImportPullRequestsReviewRequestsWorker
  • ImportPullRequestsWorker
  • ImportAttachmentsWorker
  • ImportIssuesAndDiffNotesWorker
  • ImportNotesWorker
  • ImportPullRequestsMergedByWorker
  • ImportPullRequestsReviewsWorker

These workers do not have the limit applied, as they do not fully resume their work when restarted (some skip already imported objects, but start their loop at the beginning again).

  • ImportBaseDataWorker
  • FinishImportWorker
  • ImportCollaboratorsWorker
  • ImportRepositoryWorker
  • ImportProtectedBranchesWorker

#416777 (closed)

How to set up and validate locally

  1. Apply the patch below to have Gitlab::GitHubImport::Stage::ImportPullRequestsWorker take 60 seconds to do work:
     diff --git a/app/workers/gitlab/github_import/stage/import_pull_requests_worker.rb b/app/workers/gitlab/github_import/stage/import_pull_requests_worker.rb
     index 50527dfa2d84..7f1ad8da589d 100644
     --- a/app/workers/gitlab/github_import/stage/import_pull_requests_worker.rb
     +++ b/app/workers/gitlab/github_import/stage/import_pull_requests_worker.rb
     @@ -13,6 +13,12 @@ class ImportPullRequestsWorker # rubocop:disable Scalability/IdempotentWorker
    
             resumes_work_when_interrupted!
    
     +        def perform
     +          sleep 60
     +        end
     +
             # client - An instance of Gitlab::GithubImport::Client.
             # project - An instance of Project.
             def import(client, project)
  2. Restart Sidekiq: gdk restart rails-background-jobs
  3. In a separate terminal window tail logs for the job: gdk tail rails-background-jobs | grep ImportPullRequestsWorker
  4. In a rails console, enable the flag:
    Feature.enable(:github_importer_raise_max_interruptions)
  5. In a rails console, queue the worker:
    Gitlab::GithubImport::Stage::ImportPullRequestsWorker.perform_async
  6. Wait until you see a new log come through for the worker and then stop Sidekiq: gdk stop rails-background-jobs
  7. Start Sidekiq again: gdk start rails-background-jobs
  8. Wait about 10 seconds until you see a new log come through, note the "interrupted_count" will be incremented
  9. Stop Sidekiq again.
  10. Repeat steps 6-8 3 more times, until "interrupted_count" is 4

On master, following the same steps with the same patch, you will not see a log where interrupted_count goes above 2.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #416777 (closed)

Edited by Luke Duncalfe

Merge request reports