Various sidekiq jobs do not report failures correctly when their subprocesses die

Per https://gitlab.com/gitlab-org/gitlab-ce/merge_requests/15292#note_47064568

Various Sidekiq jobs spawn subprocesses to do work, but fail to check the return value of those subprocesses. If the subprocess fails, the sidekiq job will not be retried and work will be lost as a consequence.

A partial list:

  • RepositoryImportWorker raises
  • RepositoryForkWorker raises
  • ElasticCommitIndexerWorker raises
  • GitlabShellWorker will not raise an error for:
    • mv_repository
    • fork_repository
    • remove_repository
    • add_key
    • batch_add_keys (maybe)
    • remove_key
    • remove_all_keys
  • RepositoryCheck::SingleRepositoryWorker raises
  • GitGarbageCollectWorker doesn't raise, but this seems to be OK

This is only a partial list; we should do a comprehensive audit. If we fix https://gitlab.com/gitlab-org/gitlab-ce/issues/40396 without fixing this issue, then these unreported failures will happen more often (since the subprocesses will be killed more often).

/cc @bikebilly @DouweM

Assignee Loading
Time tracking Loading