Skip to content

Enqueuer job

Context

The context for this issue can be viewed in the epic &7316 (closed) and the breakdown of work can be viewed in this comment &7316 (comment 792633854).

Enqueuer

Goal

Start the migration process for the next eligible container repository.

How it is enqueued

Two ways:

  1. Cron worker. Suggested frequency: each hour.
  2. When a container repository goes out of finishes the migration. Basically, when a container repository enters the done or aborted status.

Logic

This worker needs to be deduplicated because we have many sources that can enqueue (Sidekiq) this job and those multiple enqueues should be merged into a single one, so that we don't flood the Sidekiq queues.

  1. If the global feature flag is disabled, return.
  2. Check the capacity using the capacity feature flag and the number of (pre_importing + pre_importing_done + importing repositories).
    • If that number reaches (or is over) the capacity, return.
  3. Check the most recent container repository that was out of the migration process. Check the timestamp to enforce container_registry_phase_2_enqueue_speed_X, eg. enforce the waiting time between migration enqueues.
  4. Look for aborted container repositories.
    • If any, retry the failed migration step.
      • If the retry fails, skip this container repository
  5. If an aborted container repository was handled, return
  6. Select the next container repository using these filters:
    • Not member of the deny list
    • Is in the container_registry_target_plan
    • Has been created before container_registry_created_before
    • Select randomly (this might be not possible)
  7. Check the number of tags on the selected container repository.
    • If it has more than container_registry_max_tags_count
      • update the container repository with migration_state = skipped + migration_skipped_reason = :too_many_tags + migration_skipped_at = Time.zone.now
    • Re execute step (5.)
  8. Start the migration of the selected container repository with container_repository.start_pre_import
    • Calling the registry on the /migration/pre_import/start endpoint might return 429 Too Many Requests
      • If that's the case, try again for container_registry_start_max_retries times.
        • Implement an exponential back off so that we don't spike the number of network requests in a very short period of time.
      • If still unsuccessful, simply end the job.
    • (Re enqueue the Enqueuer job if the capacity has not been reached.)
      • I'm not sure about this step.

Notes

Using a cron job provides a way to kickstart the whole process. It also provides some self healing capability. Let's say that the whole system is terminated because of an external reason (such as emergency reboot), this job will allow us to "resume" the migration.

There is a flaw in this job that I don't like. This job can keep failing finding an eligible container repository. Let's say that the random pick is keeping selecting container repositories with more than container_registry_max_tags_count tags, we can end up in a long loop and that's not great.

This job needs to properly log everything that it does. I'm even thinking in logging the number of times that a skipped repository has been found = how many times the "selection" loop has run.

Edited by David Fernandez