Skip to content

Stop using sidekiq-cron for push mirrors

Description

Following the spirit of fixing our mirroring issues, we should do something to get the push mirrors out of the crontab hammering.

This is an infinitely easier to solve problem because we have a push event that is human generated. This means that we can know when a repo has to be updated. The only case that we should be wary about is not hammering the filesystem if the user is being too aggressive pushing things.

Agreed that it is not a lot of impact, but this will just keep growing and on the long run it will become a problem too

irb(main):021:0> Project.joins(:remote_mirrors).count # mirrors we push to
=> 3217
irb(main):022:0> Project.where(mirror: true).count # mirrors we pull from
=> 17189

Proposal

Instead of using a cron job for this we should just execute on push with a bit of back pressure to avoid hammering the filesystem. It's a really simple solution that will make this problem go away for good.

The idea goes like this:

  • On git push from a client
    • We schedule a job to perform the push with a backoff delay of, for ex. 5 minutes.
    • We add the now() timestamp to the job payload
  • On job execution
    • We fetch the remote mirror from the DB and compare the last_update_at with the provided timestamp
      • If the mirror was updated after our scheduling timestamp we just drop the job
      • else, we perform the push as we did before.

With this simple mechanism we will only execute at most once per backoff period greatly reducing pressure on the filesystem and the sidekiq queues by not pushing 3k jobs every time the clock ticks.

We should also use this opportunity to add metrics to this and record whenever a push is scheduled, executed, dropped, succeeds and fails.

cc/ @DouweM @tiagonbotelho @mydigitalself