Properly scale ContainerExpirationPolicyWorker
Summary
Follow-up of !40740 (comment 429425073).
ContainerExpirationPolicyWorker is mainly responsible of finding the cleanup policies that need an execution and mark the associated container repositories as "needing a tags cleanup".
Right now, this worker query all the executable policies and loop on them to mark the associated container repositories.
As we add more and more projects with container repositories to the cleanup policies, this loop will grow and can potentially hit the 5min threshold for workers.
This issue is to discuss the possible available solutions.
Suggested solution
Cron worker
- Nothing special to do, only to fill the capacity of the limited worker. (
😍 no loop at all🚀 ) - Perhaps we can lower the frequency to 30min instead of 50min.
Limited worker
Amount of work to do
- Number of container repositories linked to an enabled cleanup policy where
next_run_at < Time.zone.nowor cleanup status incleanup_unfinished.
Container repository selection
- linked to an enabled policy +
next_run_at<Time.zone.nowor cleanup status incleanup_unfinished. - repository cleanup status in
cleanup_unscheduledorcleanup_unfinished - order by status and then expiration_policy_started_at (cleanup_unscheduled has a higher priority than cleanup_unfinished)
- Take the first one with a lock + update cleanup status to
cleanup_ongoing
Cleanup process
- Update
last_cleanup_started_attoTime.zone.nowand the cleanup status tocleanup_ongoing - Check all the
last_cleanup_started_atfor all the sibling container repositories.- If they are all after the policy
next_run_at, update the policynext_run_atfor the next execution.
- If they are all after the policy
- Set cleanup status to
cleanup_unscheduledifpolicy.next_run_at < delete_timeout.seconds.from_nowand return. - Actual cleanup
- Success: update the status to
cleanup_unscheduled - Any error: update the status to
cleanup_unfinished
- Success: update the status to
Edited by David Fernandez