Importing large projects via runbook killed by StuckImportJobsWorker after 1 hour
Summary
Importing a large project (6GB in size, with 10,000+ merge requests) fails when done via the runbook import script https://gitlab.com/gitlab-com/runbooks/blob/master/scripts/project_import.rb if the StuckImportJobsWorker cron job runs. The StuckImportJobsWorker is supposed to kill imports that take longer than 15 hours, but in this case, the import was only running for 1 hour before the StuckImportJobsWorker killed it.
A customer reported this issue in ZD (internal use only). We recommended to them to use the runbook import script as they had issues importing via the GitLab UI.
I suspect that StuckImportJobsWorker is not working correctly since we are importing via the runbook.
Steps to reproduce
- Use GitLab 12.4.1
- Use an import containing 10,000+ merge requests and is 6GB in size
- Import the project via the import runbook
- The import fails, see in the logs that the
StuckImportJobsWorkerhas aborted the import
Example Project
An example project.json can be found in the Zendesk ticket.
What is the current bug behavior?
StuckImportJobsWorker kicks in only after 1 hour and kills the import.
What is the expected correct behavior?
The import should succeed.
Relevant logs and/or screenshots
- Import initiated at approximately
2019-11-14T12:23:47.247389Z - Import killed at approximately
2019-11-14T13:15:04.066869Z
==> /var/log/gitlab/gitlab-rails/production.log <==
Import job scheduled for <SANITIZED/SANITIZED> with job ID custom-import-@project.id-KJOTPL9ej2y94zUWhtdbLg==.
==> /var/log/gitlab/gitlab-rails/sidekiq.log <==
{"severity":"INFO","time":"2019-11-14T13:15:04.069Z","queue":"cronjob:stuck_import_jobs","class":"StuckImportJobsWorker","retry":false,"queue_namespace":"cronjob","jid":"3defac092dd5e3f6d6a8a04a","created_at":"2019-11-14T13:15:04.066869Z","correlation_id":"ee42d649-803b-4403-b11d-6508b9826868","enqueued_at":"2019-11-14T13:15:04.067439Z","pid":2418,"message":"StuckImportJobsWorker JID-3defac092dd5e3f6d6a8a04a: start","job_status":"start","scheduling_latency_s":0.001991}
==> /var/log/gitlab/gitlab-rails/production.log <==
Marked stuck import jobs as failed. JIDs: custom-import-@project.id-KJOTPL9ej2y94zUWhtdbLg==
==> /var/log/gitlab/gitlab-rails/sidekiq.log <==
{"severity":"INFO","time":"2019-11-14T13:15:04.069Z","queue":"cronjob:stuck_import_jobs","class":"StuckImportJobsWorker","retry":false,"queue_namespace":"cronjob","jid":"3defac092dd5e3f6d6a8a04a","created_at":"2019-11-14T13:15:04.066869Z","correlation_id":"ee42d649-803b-4403-b11d-6508b9826868","enqueued_at":"2019-11-14T13:15:04.067439Z","pid":2418,"message":"StuckImportJobsWorker JID-3defac092dd5e3f6d6a8a04a: start","job_status":"start","scheduling_latency_s":0.001991}
{"severity":"INFO","time":"2019-11-14T13:15:04.449Z","queue":"cronjob:stuck_import_jobs","class":"StuckImportJobsWorker","retry":false,"queue_namespace":"cronjob","jid":"3defac092dd5e3f6d6a8a04a","created_at":"2019-11-14T13:15:04.066869Z","correlation_id":"ee42d649-803b-4403-b11d-6508b9826868","enqueued_at":"2019-11-14T13:15:04.067439Z","pid":2418,"message":"StuckImportJobsWorker JID-3defac092dd5e3f6d6a8a04a: done: 0.380073 sec","job_status":"done","scheduling_latency_s":0.001991,"duration":0.380073,"cpu_s":0.026241,"completed_at":"2019-11-14T13:15:04.449399Z"}
{"severity":"INFO","time":"2019-11-14T13:15:04.449Z","queue":"cronjob:stuck_import_jobs","class":"StuckImportJobsWorker","retry":false,"queue_namespace":"cronjob","jid":"3defac092dd5e3f6d6a8a04a","created_at":"2019-11-14T13:15:04.066869Z","correlation_id":"ee42d649-803b-4403-b11d-6508b9826868","enqueued_at":"2019-11-14T13:15:04.067439Z","pid":2418,"message":"StuckImportJobsWorker JID-3defac092dd5e3f6d6a8a04a: done: 0.380073 sec","job_status":"done","scheduling_latency_s":0.001991,"duration":0.380073,"cpu_s":0.026241,"completed_at":"2019-11-14T13:15:04.449399Z"}
Workarounds
- Temporarily disable both the
import_export_project_cleanup_workerandstuck_import_jobs_workerviahttps://<GITLAB_DOMAIN>/admin/background_jobs, navigate to the Cron tab, locateimport_export_project_cleanup_workerandstuck_import_jobs_workerand then click Disable on both (found another bug as a result here - see #37135 (closed)) - Edit
/opt/gitlab/embedded/service/gitlab-rails/config/initializers/1_settings.rb, comment out these workers andsudo gitlab-ctl restart:
#Settings.cron_jobs['import_export_project_cleanup_worker'] ||= Settingslogic.new({})
#Settings.cron_jobs['import_export_project_cleanup_worker']['cron'] ||= '0 * * * *'
#Settings.cron_jobs['import_export_project_cleanup_worker']['job_class'] = 'ImportExportProjectCleanupWorker'
#Settings.cron_jobs['stuck_import_jobs_worker'] ||= Settingslogic.new({})
#Settings.cron_jobs['stuck_import_jobs_worker']['cron'] ||= '15 * * * *'
#Settings.cron_jobs['stuck_import_jobs_worker']['job_class'] = 'StuckImportJobsWorker'
Once the workaround is complete, revert the changes made.
Output of checks
(If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com)
Results of GitLab environment info
Expand for output related to GitLab environment info
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:env:info)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production)
Results of GitLab application Check
Expand for output related to the GitLab application check
(For installations with omnibus-gitlab package run and paste the output of:
sudo gitlab-rake gitlab:check SANITIZE=true)(For installations from source run and paste the output of:
sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)(we will only investigate if the tests are passing)
Possible fixes
Only workarounds have been identified, no fixes.