Skip to content

Re-run stuck merge request cleanup schedules

Problem to solve

It is possible that when a MergeRequest::CleanupSchedule record get started (have the status set to running) and the sidekiq job that was working on it got killed (Sidekiq deploy, OOM killer, or an unhandled exception), it won't be worked on again.

Proposal

This was discussed in !65647 (diffs, comment 622356631).

The following ideas were suggested:

  1. Enqueue a scheduled job to set it back to unstarted after 6 hours (or whatever the highest execution time will be from the data we get when we enable this on production).
  2. Have another cron scheduled job that updates stuck running cleanup schedules to unstarted.
  3. Have the same mechanism as repo mirroring with jid...
  4. Do the cleanup in ScheduleMergeRequestCleanupRefsWorker.

Each idea has its own pros/cons so they need to be weighed in first and choose the most appropriate one.