Skip to content

Throttle sidekiq workers using the batched background migration health checks framework

Issue

Sidekiq workers, especially low-priority workers that process queues, have led to incidents on Gitlab.com by overloading various database resources such as vacuum capacity and WAL apply capacity on follower databases.

High level approach

Adapt the health checks functionality from the batched background migrations framework so that it can throttle these sidekiq jobs and maintain database health when the database is under stress.

Specifics

In order to avoid impacting all sidekiq jobs, we expect each sidekiq worker to opt-in to check the database health status and defer (re-queued with a delay) as needed and this feature will be behind a FF (#412990 (closed)).


Recent incident: gitlab-com/gl-infra/production#8621 (closed)

Edited by Prabakaran Murugesan