Enqueuer job: set deduplication until executed (!83091) · Merge requests · GitLab.org / GitLab

David Fernandez requested to merge 356130-update-enqueuer-deduplication into master Mar 17, 2022

🍭 Context

We're currently implementing a data migration on the Container Registry. This migration is mainly driven by the rails backend.

The core driver of this migration is the Enqueuer job. Its responsibility is to find the next eligible image repository and call the Container Registry to start the migration.

I will not go into details but this job can be enqueued by multiple sources:

by itself.
by a cron schedule.
by a migration end event.

As we don't want to flood the queue with many Enqueuer jobs, we set up deduplication.

The migration is currently in staging verification. As such, it's not enabled on gitlab.com nor self-managed setups.

During staging testing, we noticed that we could have two Enqueuer job executed at the same time. We don't really need this parallelism and it could be even a source of race conditions. Picking the next eligible image repository is not using any lock = 2 parallel jobs could be working on the same image repository = 💥 .

A simple solution to this problem is to extend deduplication until jobs are executed.

That's issue #356130 (closed).

🔬 What does this MR do and why?

Update the deduplication of the container registry migration Enqueuer job with :until_executed.
No updates to specs as we don't really test deduplication parameters.

Changelog not added because the whole migration logic is gated behind a feature flag that is not enabled on gitlab.com. This MR is to fix an issue found during staging testing.

📸 Screenshots or screen recordings

n / a

🚩 How to set up and validate locally

I'm not sure that there is a reliable way to test this but here is how to create an evidence.

Update the #perform method of the job to:
```
def perform
  sleep 60 * 5
end
```
Now in a rails console (with background jobs enabled and running), execute ContainerRegistry::Migration::EnqueuerWorker.perform_async twice

With until_executing, we have:

[1] pry(main)> ContainerRegistry::Migration::EnqueuerWorker.perform_async
=> "8dd37cf065c2514b21f898c3"
[2] pry(main)> ContainerRegistry::Migration::EnqueuerWorker.perform_async
=> "f4fb7f6024f8b406dfd74ad7"

The second job is accepted and enqueued.

With until_executed, we have:

[1] pry(main)> ContainerRegistry::Migration::EnqueuerWorker.perform_async
=> "26dc987a0c3ce48acb91bb80"
[2] pry(main)> ContainerRegistry::Migration::EnqueuerWorker.perform_async
=> nil

The second job is not accepted because the first is still running.

🚥 MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

I have evaluated the MR acceptance checklist for this MR.

Edited Mar 18, 2022 by David Fernandez

Enqueuer job: set deduplication until executed

🍭 Context

🔬 What does this MR do and why?

📸 Screenshots or screen recordings

🚩 How to set up and validate locally

🚥 MR acceptance checklist

Merge request reports