Skip to content

Draft: Geo: Use job deduplication until executed

What does this MR do and why?

Describe in detail what your merge request does and why.

  • Use job deduplication with the "until_executed" strategy on Geo scheduler workers. Instead of relying solely on exclusive lease. The effect is the same, but job deduplication is defined with few lines, and is more efficient since the affected jobs don't get enqueued or executed only to exit as soon as they start running.
  • Use idempotent! in abstract classes of Geo scheduler workers instead of # rubocop:disable Scalability/IdempotentWorker

This fixes a portion of cases of #212756 (closed). We should immediately see fewer Cannot obtain an exclusive lease. There must be another instance already in execution. log messages. This is even better than the proposal in the issue, since the noisy logs won't be generated at all, rather than made less scary.

Note: I thought we would be able to remove usages of exclusive lease within workers guarded by job deduplication with the "until_executed" strategy. But on further thought, it's not necessary if they're already implemented, and the lease logic protects against at least one edge case: For example, if queues are not getting executed for a long time, past the deduplication TTL which defaults to 6 hours. This can happen during infrastructure issues, or unexpected emergency maintenance, or during downtime if queues weren't drained first. This is also a reason why we can't set a short deduplication TTL on Sidekiq jobs, like we can with leases. So the risk of removing exclusive lease logic seems not worth the work/risk.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by Michael Kozono

Merge request reports