Skip to content

Be more vigorous killing workers when shutting down sidekiq-cluster

Craig Miskell requested to merge sidekiq-cluster-terminate-hung-workers into master

What does this MR do?

In https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/8511 and gitlab-com/gl-infra/production#1309 (closed) we have seen cases where sidekiq-cluster has been asked to restart, but it has left one or more sidekiq worker processes in a stuck/hung state burning CPU.

With this, sidekiq-cluster now waits for the worker processes to terminate themselves cleanly (and push jobs back onto the queue) and another few seconds beyond, then if any processes remain, kills them hard (:KILL signal) before exiting as usual (allowing whater process monitor is in place to restart sidekiq-cluster)

Does this MR meet the acceptance criteria?

Conformity

Availability and Testing

Edited by Craig Miskell

Merge request reports