Bulk delete blobs during garbage collection for S3
The current blob deletion process for the S3 storage backend is very inefficient.
Blobs are stored in a single file named
data, and its full path looks like the following (hex digest truncated):
You can read more about the path specification here.
In the current implementation, the
storage.driver.s3.Delete method is invoked with the path to the parent "folder" of each blob
data file (
.../fff95...2d823). This leads to a
ListObjects request on that path, which only contains a single object, the
data file, followed by a
DeleteObjects request to delete that file.
storage.driver.s3.Delete method is a generic method designed to support both direct (single file) and recursive (multiple files) deletion, which is why it always lists the objects in a given path before actually deleting them.
Considering the above, for each blob to delete, the garbage collector (GC) always sends two sequential HTTP requests to S3, one
ListObjects followed by a
2N network requests, where
N is the number of blobs to delete.
The worst-case scenario can, therefore, be described as
2N requests per blob to delete, which is inefficient.
Considering that blobs are stored in a single file, named
data, there is no need to list the contents of its parent folder before deleting it. We can delete the
datafile directly, cutting the required network requests in half, from
All blob storage backends (S3, GCS, Azure, OSS and Swift) behave in the same way: empty "folders" are automatically deleted. Therefore, it's enough to remove the
datafile; there is no need to worry about its parent folder. This is actually how the existing
Deletemethod of each storage driver works (S3 example); they list and delete the files within a path and do nothing about the path itself. However, the same is not valid for the local filesystem storage backend (see Caveats).
We're not making use of the bulk delete feature for the S3
DeleteObjectsoperation. This operation supports up to 1000 objects per request. Doing so, we would be able to go from
ceil(N/1000)requests, which is a massive improvement.
Additionally to the above, we can leverage on goroutines and spawn
DeleteRequestsconcurrently for every 1000
linkfiles to delete.
Number of Requests
The current solution requires up to
2N requests to delete
Theoretically, with the changes proposed in this MR, we can achieve a worst-case scenario of
It's important to note that this doesn't only provide better performance. With S3,
ListObjects requests have a price per unit. So cutting requests is also beneficial in terms of monetary cost.
We cannot get a performance improvement as big as the reduction in network requests. There is an inherent cost to every bulk delete request. S3 needs to delete all objects before returning a response, so we're "just" (understatement) avoiding the cost of building/parsing, serializing/deserializing and sending/receiving all those individual requests.
However, S3 should be able to parallelize the deletion of the objects within a bulk delete request on the server-side, so a bulk request should always be faster than multiple single sequential requests, even without accounting for the network trip time.
Overall, when compared to the current solution, we expect the performance improvement to be exponential. The gain should grow along with the number of blobs to delete, benefiting from the reduction of the required network requests and concurrency.
No benefit comes without a cost. For all supported blob storage backends (S3, GCS, Azure, OSS and Swift) empty "folders" are automatically "deleted" (in reality, they don't exist, they're just an abstraction, a grouping key). Therefore we can delete the
data files directly and not worry about leaving an empty folder behind.
However, the above is not valid for the local filesystem and the in-memory (for test purposes only) storage backends. For the local filesystem, when the
Delete method is invoked with the path of the parent folder for a blob
data file, the folder is removed along with the file.
This caveat was tackled separately in !47 (merged).
We ran a series of benchmarks to validate the expected improvements, both in terms of the number of network requests and the overall performance.
The following test matrix was used and allows us to simulate the deletion of
|# blobs (
- Current implementation, using the tip of the
release-2.7-gitlabbranch at the time of writing;
- This MR.
- For each execution we recorded the number of requests and the elapsed time (ns/op);
- For each blob, the benchmark creates and uploads a random image layer (~20MB) to S3. Therefore we decided to keep the number of blobs short, with a maximum of 1000 (20GB in total);
- Each benchmark was executed exactly once, due to the constraint described above.
Benchmarks ran against a Standard S3 bucket on the
eu-central-1 region. The test registry was kept in a local development machine. The measured latency between the local machine and the AWS region was 48ms.
Benchmarks were executed using the following command:
go test -v github.com/docker/distribution/registry/storage/driver/s3-aws -args -check.b -check.f "DriverSuite.BenchmarkRemoveBlob*" -gocheck.btime 1ns
Number of Requests
|# of blobs (
||Current version (# of requests)||This MR (# of requests)||# times less requests|
As observed in the table above, the reduction in the number of required network requests is outstanding. The results are aligned with the worst-case scenario of
2N requests for the current behaviour, and
Ceil(N/1000) for the new proposed behaviour.
The results get better as the number of blobs to delete increases, as each
DeleteObjects request can include up to 1000 objects to delete.
We should keep in mind that every S3 request (except
DELETE requests, which we don't use, as
POST) comes with a monetary cost.
|# of blobs (
||Current version (ns/op)||This MR (ns/op)||# times faster|
These results show that for our test scenario, the run time can be up to 42 times faster than the current implementation. The improvement ratio should continue to develop as
The performance improvement should be even more dramatic when deleting substancially more than 1000 blobs, as with !32 (merged) we implemented support for concurrent requests per each batch of 1000 blobs to delete.
The benchmarks seem to confirm the expectations regarding the number of requests and the overall speedup. As expected, nothing comes without a cost. The foreseeable caveats have been discussed and can be addressed to mitigate concerns.
Note: Further testing against a real large repository is required to validate the feasibility and effectiveness of this change. None of the approaches and results highlighted here are guaranteed until then. The final details will be described later in !24 (merged).