RepositoryUpdateMirrorWorker is run too frequently and floods Sidekiq queues
This is what is flooding our Sidekiq queues:
-
UpdateAllMirrorsWorkeris run on the hour. There are about 13,000 mirrors. - In batches of 200, it spawns
RepositoryUpdateMirrorWorkerto start launching jobs at random times between now and 30 minutes. - This then adds
RepositoryUpdateMirrorDispatchWorkerto the queue with another 13,000 new entries. - If there are retries, then more jobs are added.
In the graph below, you can see the effect of RepositoryUpdateMirrorWorker running followed by RepositoryUpdateMirrorDispatchWorker:
What's happening:
-
RepositoryUpdateMirrorWorker: First, there are additional 13,000 jobs to update the DB. -
RepositoryUpdateMirrorDispatchWorker: Once those jobs are finished, another 13,000 jobs are used to launchgit fetch.
Ideas for improvement:
- Back off the mirror time from 1 hour to something more (e.g. once a day--then we could spread 13,000 jobs across 24 hours)
- Only schedule a update if the project has not been updated in some time (e.g. if we just updated at 10:50 am, no sense in running at 11 am)
- Update the DB in batches instead of one worker per project
Other ideas?
