Increase interruption retries for GitHub::Stage workers
The GitHubImport stage workers that are responsible for reading the list of resources from GitHub API, for example, the list of all pull requests, can take several minutes to complete. Because of that, they can be interrupted due to Sidekiq restarts.
This is problematic because if a worker is interrupted 3 times, the worker is put in the dead queue, which causes the import process to get stuck.
Report on the jobs with the slowest execution times
Kibana dashboard - internal only
Problem
Currently, it's rare for a worker to be interrupted 3 times, because usually GitHub API rate limit is reached when the worker is being executed which causes the worker to be re-enqueued and consequently to reset the Sidekiq interruption counter to zero.
However, if multiple user tokens are used, the GitHub API rate limit won't be reached that often. Therefore the interruption counter won't be reset, which could cause the job to be placed in the dead queue. Removed see #416777 (comment 1599046625)
Click to read original possible solutions
Possible Solution
1. Update import stages workers to import one resource page at a time
Instead of looping through all resources' pages in the same worker execution, we could update the workers to import to one page in each execution. For example, to import pull requests, we would request the first 100 pull requests, then enqueue another stage worker to process the next 100 pull requests, and so on.
2. Periodically re-enqueue stage workers
Update the Stage workers to be re-enqueued after a certain execution time. For example re-enqueue the worker every 10 minutes.
Basically, we would track the time the worker started, then before fetching the next page, we would check for how long the worker is running. If it's more than the limit, we will stop the worker and re-enqueue a new one. Otherwise, fetch the next page and continue the execution.