Increase fault tolerance for Redis\Sidekiq outage.
Summary
Inspired by !109967 (merged)
Several of our background jobs are critical to the app functioning. So if Redis\Sidekiq goes down DB state can become invalid for specific records. As a result follow-up cleanups and fixes are introduced which increase code complexity and doesn't help with overall fault tolerance.
Proposal
Possible options for evaluation:
- Execute critical jobs in sync if scheduling is failed.
- Pile up jobs which were not scheduled until Redis is back?
- .....
@gitlab-org/maintainers/rails-backend @gitlab-org/maintainers/database WDYT?