Importing large projects via runbook killed by StuckImportJobsWorker after 1 hour

Summary

Importing a large project (6GB in size, with 10,000+ merge requests) fails when done via the runbook import script https://gitlab.com/gitlab-com/runbooks/blob/master/scripts/project_import.rb if the StuckImportJobsWorker cron job runs. The StuckImportJobsWorker is supposed to kill imports that take longer than 15 hours, but in this case, the import was only running for 1 hour before the StuckImportJobsWorker killed it.

A customer reported this issue in ZD (internal use only). We recommended to them to use the runbook import script as they had issues importing via the GitLab UI.

I suspect that StuckImportJobsWorker is not working correctly since we are importing via the runbook.

Steps to reproduce

  • Use GitLab 12.4.1
  • Use an import containing 10,000+ merge requests and is 6GB in size
  • Import the project via the import runbook
  • The import fails, see in the logs that the StuckImportJobsWorker has aborted the import

Example Project

An example project.json can be found in the Zendesk ticket.

What is the current bug behavior?

StuckImportJobsWorker kicks in only after 1 hour and kills the import.

What is the expected correct behavior?

The import should succeed.

Relevant logs and/or screenshots

  • Import initiated at approximately 2019-11-14T12:23:47.247389Z
  • Import killed at approximately 2019-11-14T13:15:04.066869Z
==> /var/log/gitlab/gitlab-rails/production.log <==
Import job scheduled for <SANITIZED/SANITIZED> with job ID custom-import-@project.id-KJOTPL9ej2y94zUWhtdbLg==.

==> /var/log/gitlab/gitlab-rails/sidekiq.log <==
{"severity":"INFO","time":"2019-11-14T13:15:04.069Z","queue":"cronjob:stuck_import_jobs","class":"StuckImportJobsWorker","retry":false,"queue_namespace":"cronjob","jid":"3defac092dd5e3f6d6a8a04a","created_at":"2019-11-14T13:15:04.066869Z","correlation_id":"ee42d649-803b-4403-b11d-6508b9826868","enqueued_at":"2019-11-14T13:15:04.067439Z","pid":2418,"message":"StuckImportJobsWorker JID-3defac092dd5e3f6d6a8a04a: start","job_status":"start","scheduling_latency_s":0.001991}

==> /var/log/gitlab/gitlab-rails/production.log <==
Marked stuck import jobs as failed. JIDs: custom-import-@project.id-KJOTPL9ej2y94zUWhtdbLg==

==> /var/log/gitlab/gitlab-rails/sidekiq.log <==
{"severity":"INFO","time":"2019-11-14T13:15:04.069Z","queue":"cronjob:stuck_import_jobs","class":"StuckImportJobsWorker","retry":false,"queue_namespace":"cronjob","jid":"3defac092dd5e3f6d6a8a04a","created_at":"2019-11-14T13:15:04.066869Z","correlation_id":"ee42d649-803b-4403-b11d-6508b9826868","enqueued_at":"2019-11-14T13:15:04.067439Z","pid":2418,"message":"StuckImportJobsWorker JID-3defac092dd5e3f6d6a8a04a: start","job_status":"start","scheduling_latency_s":0.001991}
{"severity":"INFO","time":"2019-11-14T13:15:04.449Z","queue":"cronjob:stuck_import_jobs","class":"StuckImportJobsWorker","retry":false,"queue_namespace":"cronjob","jid":"3defac092dd5e3f6d6a8a04a","created_at":"2019-11-14T13:15:04.066869Z","correlation_id":"ee42d649-803b-4403-b11d-6508b9826868","enqueued_at":"2019-11-14T13:15:04.067439Z","pid":2418,"message":"StuckImportJobsWorker JID-3defac092dd5e3f6d6a8a04a: done: 0.380073 sec","job_status":"done","scheduling_latency_s":0.001991,"duration":0.380073,"cpu_s":0.026241,"completed_at":"2019-11-14T13:15:04.449399Z"}
{"severity":"INFO","time":"2019-11-14T13:15:04.449Z","queue":"cronjob:stuck_import_jobs","class":"StuckImportJobsWorker","retry":false,"queue_namespace":"cronjob","jid":"3defac092dd5e3f6d6a8a04a","created_at":"2019-11-14T13:15:04.066869Z","correlation_id":"ee42d649-803b-4403-b11d-6508b9826868","enqueued_at":"2019-11-14T13:15:04.067439Z","pid":2418,"message":"StuckImportJobsWorker JID-3defac092dd5e3f6d6a8a04a: done: 0.380073 sec","job_status":"done","scheduling_latency_s":0.001991,"duration":0.380073,"cpu_s":0.026241,"completed_at":"2019-11-14T13:15:04.449399Z"}

Workarounds

  • Temporarily disable both the import_export_project_cleanup_worker and stuck_import_jobs_worker via https://<GITLAB_DOMAIN>/admin/background_jobs, navigate to the Cron tab, locate import_export_project_cleanup_worker and stuck_import_jobs_worker and then click Disable on both (found another bug as a result here - see #37135 (closed))
  • Edit /opt/gitlab/embedded/service/gitlab-rails/config/initializers/1_settings.rb, comment out these workers and sudo gitlab-ctl restart:
#Settings.cron_jobs['import_export_project_cleanup_worker'] ||= Settingslogic.new({})
#Settings.cron_jobs['import_export_project_cleanup_worker']['cron'] ||= '0 * * * *'
#Settings.cron_jobs['import_export_project_cleanup_worker']['job_class'] = 'ImportExportProjectCleanupWorker'
#Settings.cron_jobs['stuck_import_jobs_worker'] ||= Settingslogic.new({})
#Settings.cron_jobs['stuck_import_jobs_worker']['cron'] ||= '15 * * * *'
#Settings.cron_jobs['stuck_import_jobs_worker']['job_class'] = 'StuckImportJobsWorker'

Once the workaround is complete, revert the changes made.

Output of checks

(If you are reporting a bug on GitLab.com, write: This bug happens on GitLab.com)

Results of GitLab environment info

Expand for output related to GitLab environment info

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:env:info)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:env:info RAILS_ENV=production)

Results of GitLab application Check

Expand for output related to the GitLab application check

(For installations with omnibus-gitlab package run and paste the output of: sudo gitlab-rake gitlab:check SANITIZE=true)

(For installations from source run and paste the output of: sudo -u git -H bundle exec rake gitlab:check RAILS_ENV=production SANITIZE=true)

(we will only investigate if the tests are passing)

Possible fixes

Only workarounds have been identified, no fixes.

Assignee Loading
Time tracking Loading