Skip to content

storage/filesystem: Remove empty parent dirs in DeleteFiles

Problem

While working on !31 (merged) and !39 (merged), and more precisely while using the filesystem storage driver, we noticed that deleting blobs/manifests directly by their data/link file path will cause their parent folders to be left behind empty.

Solution

The solution is to delete a file's parent directory (if empty) after removing the file itself. Such can be done by customizing the DeleteFiles method for this specific storage driver.

This change will make the filesystem driver behave like all other (blob) storage drivers, which automatically "delete" empty "directories".

Options

1. Delete empty parent directories

After deleting each file in the loop within DeleteFiles, we need to check if its parent directory is empty and remove it as well if so.

This approach requires another delete operation but makes it possible to use the DeleteFiles method for other use cases beyond the deletion of blobs and manifests.

It's also safer (no unexpected deletions of non-empty directories) and mimics the behaviour of the blob storage backends perfectly.

This approach has been implemented in !47 (e6d7b0b8).

2. Delete parent directories directly

The most straightforward option would be to pass the path of a file parent directory to the Delete method (here), instead of passing the path of the file itself.

This approach would work just fine for deleting blobs and manifests, as their parent folders have no other files and the Delete method removes everything at a given path, including the path itself, using a os.RemoveAll call. This would replicate the current behaviour (before !24 (merged)) when deleting blobs and manifests.

However, this would render the DeleteFiles method useless for other use cases beyond deleting blobs and manifests during garbage collection. We would have to rename the method, or at least make it very clear in the documentation that this method will blindly delete parent directories, regardless if they're empty or not.

This approach has been demonstrated in !47 (35d3e2f9).

Benchmarks

Some benchmarks have been performed to compare the options described above in terms of performance.

Option 2 resembles how blobs and manifests are currently deleted in release/2.7-gitlab (c69bd341 at the time of writing this), so we can use its results as a reference.

Setup

The benchmark used, DriverSuite.BenchmarkDeleteFiles1File, creates one small file inside a series of random directories and then invokes DeleteFiles to remove it.

The benchmark executed 10000 times for each run, using the following command:

go test -v github.com/docker/distribution/registry/storage/driver/filesystem -args -check.b -check.f "DriverSuite.BenchmarkDeleteFiles1File" -gocheck.btime 5s

Results

Scenario ns/op
Option 1 (!47 (e6d7b0b8)) 1188316
Option 2 (!47 (35d3e2f9)) 1277934

As seen above, Option 1 is slightly faster than Option 2 (reference), more precisely, 7% faster.

Although negligible, I believe that a difference can be justified by the additional overhead of performing an os.RemoveAll call (recursive) on a directory and also on the additional os.Stat call performed within Delete.

Conclusion

As Option 1 doesn't incur in any noticeable overhead, in my opinion, this should be the preferred approach (as-is of this branch).

Related to #14 (closed)

Edited by João Pereira

Merge request reports