Properly scale ContainerExpirationPolicyWorker

Summary

Follow-up of !40740 (comment 429425073).

ContainerExpirationPolicyWorker is mainly responsible of finding the cleanup policies that need an execution and mark the associated container repositories as "needing a tags cleanup".

Right now, this worker query all the executable policies and loop on them to mark the associated container repositories.

As we add more and more projects with container repositories to the cleanup policies, this loop will grow and can potentially hit the 5min threshold for workers.

This issue is to discuss the possible available solutions.

Suggested solution

Cron worker

Nothing special to do, only to fill the capacity of the limited worker. (😍 no loop at all 🚀)
Perhaps we can lower the frequency to 30min instead of 50min.

Limited worker

Amount of work to do

Number of container repositories linked to an enabled cleanup policy where next_run_at < Time.zone.now or cleanup status in cleanup_unfinished.

Container repository selection

linked to an enabled policy + next_run_at < Time.zone.now or cleanup status in cleanup_unfinished.
repository cleanup status in cleanup_unscheduled or cleanup_unfinished
order by status and then expiration_policy_started_at (cleanup_unscheduled has a higher priority than cleanup_unfinished)
Take the first one with a lock + update cleanup status to cleanup_ongoing

Cleanup process

Update last_cleanup_started_at to Time.zone.now and the cleanup status to cleanup_ongoing
Check all the last_cleanup_started_at for all the sibling container repositories.
- If they are all after the policy next_run_at, update the policy next_run_at for the next execution.
Set cleanup status to cleanup_unscheduled if policy.next_run_at < delete_timeout.seconds.from_now and return.
Actual cleanup
- Success: update the status to cleanup_unscheduled
- Any error: update the status to cleanup_unfinished

Edited Dec 21, 2020 by David Fernandez