Guard job
Context
The context for this issue can be viewed in the epic &7316 (closed) and the breakdown of work can be viewed in this comment &7316 (comment 792633854).
Guard / Watcher
Goal
Watch for how long container repositories have been in (pre)_import
state. Detect stale migrations and abort them.
It also acts as the "self heal" component. Container Registry notifications are not guaranteed to be received by Rails. As such, we can miss them and this is a something we need to be prepared for. Imagine that we miss the notification that the import
step is done = the container repository stays in read-only
mode
How it is enqueued
This will be a cron worker. Suggested frequency: each 10 minutes.
X
still needs to be set but it must be lower than container_registry_max_step_duration
Logic
- Loop through states
import
,pre_import
,pre_import_done
(order is important here) and loop on each container repositories that have been in those statuses longer thancontainer_registry_max_step_duration
. For each container repository:- Ping the Container Registry on
migration/status
- For erroneous responses, execute
container_repository.abort
- if
migration_retries_count
=container_registry_max_retries
, executecontainer_repository.skip
- if
- For "migration step ongoing response" responses, skip the container repository.
- For "migration step successful" responses, transition the container repository to the next
migration_state
.
- Ping the Container Registry on
Notes
I thought about enqueuing this job only when we need to (example, when a container repository is pushed in the pre_importing
, enqueue the Watcher job to run in container_registry_max_step_duration
). This is nice but the risk in my eyes, is too big. We could miss a container repository in the importing
status which is the status where we don't allow write operations.
That risk is too big (imagine a user saying: "I couldn't push to my container repository these last 3 days"
I would rather spend some backend resources (basically the cron job will have "no op" executions) than missing a container repository in a given migration_state
.