Fix package file cleanup worker with duplicated package files
🔥 Problem
The package file cleanup worker is a background job that is kick started by a cron job and then, it will execute in loops:
- pick the next package file marked as pending destruction.
- destroy it.
- re_enqueue if there are more package files marked as pending destruction.
The problem is that for (2.) the worker tries to update the status. If there is any problem during (2.), the job will fail and skip (3.)
That's what is currently occuring:
A package file validation (only for files from PyPI packages) is not allowing the update and so the worker will simply end.
This breaks the loop and package files that need to be deleted have started to accumulate on gitlab.com.
Note that given that the files are hidden from the UI and APIs, those ghost package files that need to be deleted doesn't bring an issue to users.
pending_destruction
marked package files are still taken into account for storage quotas. Because the cleanup loop is broken, the storage usage can grow because of those "ghost" package files.
🚒 Solution
- Fix the validation to not run it if the package file is marked as pending destruction
- Update (2.) so that the status update is sent with validations disabled.
- This is fine to do, we really want to delete the file so we don't really care about validations during the status update.