Increase Sidekiq retries from 3 back to 25
Way back in gitlab-org/gitlab-foss!7294 (merged), we set our default number of Sidekiq retries (individual workers can choose to do a different number) from 25 to 3. Per https://github.com/mperham/sidekiq/wiki/Error-Handling#automatic-job-retry:
- 25 retries happen over a period of around three weeks.
- 3 retries happen over a period of a couple of minutes.
This means that we're used to jobs failing completely, which in a way is a good thing: it makes us build a more resilient system. But it's also a bad thing, as jobs can very easily be lost entirely. For instance, if our database goes down for half an hour, every job with 3 retries will fail completely in that time and not automatically retry when it comes back up.
It would be good to eventually bump the number of retries back up to 25, but I think we need to tread cautiously as otherwise we will end up re-processing a lot of jobs that will never succeed, as in the original MR.
This issue is a following up from this comment.