Prevent amplification of ReactiveCachingWorker jobs upon failures (!30432) · Merge requests · GitLab.org / GitLab FOSS

Stan Hu requested to merge sh-disable-reactive-caching-automatic-retries into master Jul 06, 2019

When ReactiveCachingWorker hits an SSL or other exception that occurs quickly and reliably, automatically rescheduling a new worker could lead to excessive number of jobs being scheduled. Each run of ReactiveCachingWorker reschedules itself, but a failure also causes Sidekiq to schedule up to 3 retires in the retry set. These retries, in turn, will also schedule more jobs.

In busy instances, this can become an issue because large numbers of ReactiveCachingWorker running can cause high rates of ExclusiveLease reads to occur and possibly saturate the Redis server with queries.

We now disable this automatic retry and rely on Sidekiq to perform its 3 retries with a backoff period.

Closes https://gitlab.com/gitlab-org/gitlab-ce/issues/64176

Edited Jul 06, 2019 by Stan Hu

Prevent amplification of ReactiveCachingWorker jobs upon failures

Merge request reports