Skip to content

Abort long running registry migrations

Steve Abrams requested to merge 352968-guard-worker-long-running into master

What does this MR do and why?

We are preparing to import container repositories into the new container registry database. This process involves having rails make various API requests to the registry to start the imports and also to monitor them.

We have a GuardWorker who's job it is to check the status of imports that are taking longer than expected. When we initially implemented this worker, if it finds an import that is beyond the set time limit, it would check the true status of that import by making a request to the registry, and if the registry responded saying it was still importing, we did nothing other than log that the import was taking a long time.

In container-registry#514 (closed) we decided that rather than let these long running migrations continue, we should actively cancel them.

This MR updates the worker to make a cancel request to the registry and if the cancellation is successful, we mark the import as skipped with the reason of migration_canceled.

If for some reason the registry fails to cancel the migration, we either reconcile the state in rails to match the true state supplied by the registry response, or we abort the migration.

To make the cancel request, we add support to request this DELETE endpoint from the registry.

Since one responsibility of this GuardWorker is to update the migration_state of container repositories to match the true state in the registry, we want to be sure there is nothing preventing any possible transitions from one state to another. This MR also updates the state_machine logic in ContainerRepository to allow for the possible transitions that may occur. We ensure the following transitions can take place:

  • From any of [import_aborted, importing, pre_importing] to any of [pre_import_done, import_done, importing, pre_importing, aborted]
  • From either of [importing, pre_importing] to import_skipped.

This worker and entire import process is behind a feature flag. Once we have all of the pieces in place, we will be testing in the staging environment over the next several weeks before beginning the production imports.

Screenshots or screen recordings

These are strongly recommended to assist reviewers and reduce the time to merge your change.

How to set up and validate locally

Numbered steps to set up and validate the change are strongly suggested.

MR acceptance checklist

This checklist encourages us to confirm any changes have been analyzed to reduce risks in quality, performance, reliability, security, and maintainability.

Related to #352968 (closed)

Edited by Steve Abrams

Merge request reports