Make it easier to find and kill a JID from a Sidekiq worker

In https://gitlab.com/gitlab-com/infrastructure/issues/2746 we found that for long-running Sidekiq jobs, it's difficult to figure out which machine and PID is processing that job. In order to kill the job, we had to:

  1. Look in the Kibana logs to find the last Sidekiq "start" entry
  2. Log into the machine, issue a TSTP signal on the Sidekiq worker to make it stop accepting new work
  3. Wait for a while for other jobs to finish
  4. Forcibly kill the Sidekiq process via kill -9

What we may want to consider:

  1. Add a status page in the Sidekiq admin panel that allows us to see all JIDs and which nodes/PIDs are processing which jobs (perhaps using https://github.com/mperham/sidekiq/wiki/API#workers)
  2. Add some sort of "ban" button to make that JID a NOP for the next hour

/cc: @ayufan

Assignee Loading
Time tracking Loading