Offline Garbage Collections fail on filesystems with none images pushed
Context
As concluded with @bprescott_ we believe gitlab-ctl registry-garbage-collect could get some love
omnibus-gitlab#2811 (comment 1038932875)
Running it with no images pushed it fails after unnecessary registry shutdown for offline garbage collection:
failed to garbage collect: marking blobs: : Path not found: /docker/registry/v2/repositories
Ideally we should have a positive exit code after check even if no garbage collection is needed in such cases.
This would help also the documented best-practice of a weekly cron job to trigger offline garbage collection:
https://docs.gitlab.com/ee/administration/packages/container_registry.html#running-the-garbage-collection-on-schedule
Implementation Guide
In the offline garbage collector, before starting to enumerate repositories, check if the repositories root path exists. If the path does not exist, exit early with a log message saying that there are no repositories.
The storage drivers for object storage make use of path specs found in paths.go.
In the issue reported above, we see that the storage drive is not able to find the /docker/registry/v2/repositories path, this corresponds to the repositoriesRootPathSpec
To check for the existence of this path, we'll need to do the following at the beginning of the MarkAndSweep:
- Convert the
repositoriesRootPathSpecto a string that the storage driver can use via thepathForfunction. - Pass that string to the
Statmethod of the storage driver. - Check if the error is not
niland if so, if that error is aPathNotFoundError - If the error is a
PathNotFoundErrorwe can log a message indicating that garbage collection was skipped and exit the function early with no error. - If the error is not
nil, but is also not aPathNotFoundErrorwe should return the error with context - Finally, if the error is
nil, we can continueMarkAndSweepas normal.
Testing
For testing, we need to create a new storage driver and a registry.
Such as in the first few lines of TestGarbageCollectAfterLastTagRemoved.
Afterwards, we should run MarkAndSweep and ensure no error is returned from that function.
See the end of TestNoDeletionNoEffect, minus the last two lines for an example of running this in a test.