Skip to content

Delete the duplicate job even in case of errors

🎉 MR 87700 is mine, woohoo! 🎉

🔍 Context

During the investigation of !87649 (merged), we noticed that with the

deduplicate :until_executed

configuration, the deduplication key doesn't seem to be removed when there is an error in the worker execution.

See:

Screenshot_2022-05-16_at_11.17.59

This was surprising and counter intuitive. A job that fails is still a job's end: the deduplication key should be freed. Subsequent executions should be allowed to run.

🔬 What does this MR do and why?

  • Update the until_executed to make sure that the duplicated job is deleted even in case of errors.
  • Update the related specs.

📺 Screenshots or screen recordings

n / a

How to set up and validate locally

This is not ease to test locally but we can try.

We're going to use the ContainerRegistry::Migration::GuardWorker for this verification.

  1. Update ContainerRegistry::Migration::GuardWorker#perform to
    def perform
      raise ArgumentError, 'Boom'
    end
  2. Update config/initializers/1_settings.rb so that the worker is executed each 2 minutes:
    Settings.cron_jobs['container_registry_migration_guard_worker']['cron'] ||= '*/2 * * * *'
  3. Start the background jobs and observe the logs:
    $ tail -f log/sidekiq.log | grep "GuardWorker"

On master

We get:

{"time":"2022-05-17T11:54:21.704Z","class":"ContainerRegistry::Migration::GuardWorker","job_status":"start"}
{"time":"2022-05-17T11:54:24.199Z","class":"ContainerRegistry::Migration::GuardWorker","job_status":"fail"}


{"time":"2022-05-17T11:56:32.540Z","class":"ContainerRegistry::Migration::GuardWorker","job_status":"deduplicated"}

Execution 1 failed and that failure left the deduplication key behind. As a consequence, execution 2 is deduplicated. 😿

With this MR

We get this log:

{"time":"2022-05-17T11:46:25.180Z","class":"ContainerRegistry::Migration::GuardWorker","job_status":"start"}
{"time":"2022-05-17T11:46:27.650Z","class":"ContainerRegistry::Migration::GuardWorker","job_status":"fail"}


{"time":"2022-05-17T11:48:26.756Z","class":"ContainerRegistry::Migration::GuardWorker","job_status":"start"}
{"time":"2022-05-17T11:48:29.651Z","class":"ContainerRegistry::Migration::GuardWorker","job_status":"fail"}

Execution 1 fails but execution 2 is allowed to run because the deduplication key is freed when execution 1 fails.

🚥 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Edited by David Fernandez

Merge request reports