Enqueuer job
Context
The context for this issue can be viewed in the epic &7316 (closed) and the breakdown of work can be viewed in this comment &7316 (comment 792633854).
Enqueuer
Goal
Start the migration process for the next eligible container repository.
How it is enqueued
Two ways:
- Cron worker. Suggested frequency: each hour.
- When a container repository goes out of finishes the migration. Basically, when a container repository enters the
doneorabortedstatus.
Logic
This worker needs to be deduplicated because we have many sources that can enqueue (Sidekiq) this job and those multiple enqueues should be merged into a single one, so that we don't flood the Sidekiq queues.
- If the global feature flag is disabled, return.
- Check the capacity using the capacity feature flag and the number of (
pre_importing+pre_importing_done+importingrepositories).- If that number reaches (or is over) the capacity, return.
- Check the most recent container repository that was out of the migration process. Check the timestamp to enforce
container_registry_phase_2_enqueue_speed_X, eg. enforce the waiting time between migration enqueues. - Look for aborted container repositories.
- If any, retry the failed migration step.
- If the retry fails, skip this container repository
- If any, retry the failed migration step.
- If an aborted container repository was handled, return
- Select the next container repository using these filters:
- Not member of the deny list
- Is in the
container_registry_target_plan - Has been created before
container_registry_created_before - Select randomly (this might be not possible)
- Check the number of tags on the selected container repository.
- If it has more than
container_registry_max_tags_count- update the container repository with
migration_state = skipped+migration_skipped_reason = :too_many_tags+migration_skipped_at = Time.zone.now
- update the container repository with
- Re execute step (5.)
- If it has more than
- Start the migration of the selected container repository with
container_repository.start_pre_import-
⚠ Calling the registry on the/migration/pre_import/startendpoint might return429 Too Many Requests- If that's the case, try again for
container_registry_start_max_retriestimes.- Implement an exponential back off so that we don't spike the number of network requests in a very short period of time.
- If still unsuccessful, simply end the job.
- If that's the case, try again for
- (Re enqueue the Enqueuer job if the capacity has not been reached.)
- I'm not sure about this step.
-
Notes
Using a cron job provides a way to kickstart the whole process. It also provides some self healing capability. Let's say that the whole system is terminated because of an external reason (such as emergency reboot), this job will allow us to "resume" the migration.
There is a flaw in this job that I don't like. This job can keep failing finding an eligible container repository. Let's say that the random pick is keeping selecting container repositories with more than container_registry_max_tags_count tags, we can end up in a long loop and that's not great.
This job needs to properly log everything that it does. I'm even thinking in logging the number of times that a skipped repository has been found = how many times the "selection" loop has run.