storage/filesystem: Remove empty parent dirs in DeleteFiles
Problem
While working on !31 (merged) and !39 (merged), and more precisely while using the filesystem storage driver, we noticed that deleting blobs/manifests directly by their data/link file path will cause their parent folders to be left behind empty.
Solution
The solution is to delete a file's parent directory (if empty) after removing the file itself. Such can be done by customizing the DeleteFiles
method for this specific storage driver.
This change will make the filesystem driver behave like all other (blob) storage drivers, which automatically "delete" empty "directories".
Options
1. Delete empty parent directories
After deleting each file in the loop within DeleteFiles
, we need to check if its parent directory is empty and remove it as well if so.
This approach requires another delete operation but makes it possible to use the DeleteFiles
method for other use cases beyond the deletion of blobs and manifests.
It's also safer (no unexpected deletions of non-empty directories) and mimics the behaviour of the blob storage backends perfectly.
This approach has been implemented in !47 (e6d7b0b8).
2. Delete parent directories directly
The most straightforward option would be to pass the path of a file parent directory to the Delete
method (here), instead of passing the path of the file itself.
This approach would work just fine for deleting blobs and manifests, as their parent folders have no other files and the Delete
method removes everything at a given path, including the path itself, using a os.RemoveAll
call. This would replicate the current behaviour (before !24 (merged)) when deleting blobs and manifests.
However, this would render the DeleteFiles
method useless for other use cases beyond deleting blobs and manifests during garbage collection. We would have to rename the method, or at least make it very clear in the documentation that this method will blindly delete parent directories, regardless if they're empty or not.
This approach has been demonstrated in !47 (35d3e2f9).
Benchmarks
Some benchmarks have been performed to compare the options described above in terms of performance.
Option 2 resembles how blobs and manifests are currently deleted in release/2.7-gitlab
(c69bd341 at the time of writing this), so we can use its results as a reference.
Setup
The benchmark used, DriverSuite.BenchmarkDeleteFiles1File
, creates one small file inside a series of random directories and then invokes DeleteFiles
to remove it.
The benchmark executed 10000 times for each run, using the following command:
go test -v github.com/docker/distribution/registry/storage/driver/filesystem -args -check.b -check.f "DriverSuite.BenchmarkDeleteFiles1File" -gocheck.btime 5s
Results
Scenario | ns/op |
---|---|
Option 1 (!47 (e6d7b0b8)) | 1188316 |
Option 2 (!47 (35d3e2f9)) | 1277934 |
As seen above, Option 1 is slightly faster than Option 2 (reference), more precisely, 7% faster.
Although negligible, I believe that a difference can be justified by the additional overhead of performing an os.RemoveAll
call (recursive) on a directory and also on the additional os.Stat
call performed within Delete
.
Conclusion
As Option 1 doesn't incur in any noticeable overhead, in my opinion, this should be the preferred approach (as-is of this branch).
Related to #14 (closed)