Skip to content

Prevent amplification of ReactiveCachingWorker jobs upon failures

Stan Hu requested to merge sh-disable-reactive-caching-automatic-retries into master

When ReactiveCachingWorker hits an SSL or other exception that occurs quickly and reliably, automatically rescheduling a new worker could lead to excessive number of jobs being scheduled. Each run of ReactiveCachingWorker reschedules itself, but a failure also causes Sidekiq to schedule up to 3 retires in the retry set. These retries, in turn, will also schedule more jobs.

In busy instances, this can become an issue because large numbers of ReactiveCachingWorker running can cause high rates of ExclusiveLease reads to occur and possibly saturate the Redis server with queries.

We now disable this automatic retry and rely on Sidekiq to perform its 3 retries with a backoff period.

Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/64176

Edited by Stan Hu

Merge request reports