Stop using sidekiq-cron for push mirrors
Description
Following the spirit of fixing our mirroring issues, we should do something to get the push mirrors out of the crontab hammering.
This is an infinitely easier to solve problem because we have a push event that is human generated. This means that we can know when a repo has to be updated. The only case that we should be wary about is not hammering the filesystem if the user is being too aggressive pushing things.
Agreed that it is not a lot of impact, but this will just keep growing and on the long run it will become a problem too
irb(main):021:0> Project.joins(:remote_mirrors).count # mirrors we push to
=> 3217
irb(main):022:0> Project.where(mirror: true).count # mirrors we pull from
=> 17189
Proposal
Instead of using a cron job for this we should just execute on push with a bit of back pressure to avoid hammering the filesystem. It's a really simple solution that will make this problem go away for good.
The idea goes like this:
- On git push from a client
- We schedule a job to perform the push with a backoff delay of, for ex. 5 minutes.
- We add the
now()
timestamp to the job payload
- On job execution
- We fetch the remote mirror from the DB and compare the
last_update_at
with the provided timestamp- If the mirror was updated after our scheduling timestamp we just drop the job
- else, we perform the push as we did before.
- We fetch the remote mirror from the DB and compare the
With this simple mechanism we will only execute at most once per backoff period greatly reducing pressure on the filesystem and the sidekiq queues by not pushing 3k jobs every time the clock ticks.
We should also use this opportunity to add metrics to this and record whenever a push is scheduled, executed, dropped, succeeds and fails.