Bulk delete blobs during garbage collection for S3
Context
This change is part of !24 and related to #10 .
The current blob deletion process for the S3 storage backend is very inefficient.
Blobs are stored in a single file named data
, and its full path looks like the following (hex digest truncated):
docker/registry/v2/blobs/sha256/ff/fff95...2d823/data
You can read more about the path specification here.
Current Behaviour
In the current implementation, the storage.driver.s3.Delete
method is invoked with the path to the parent "folder" of each blob data
file (.../fff95...2d823
). This leads to a ListObjects
request on that path, which only contains a single object, the data
file, followed by a DeleteObjects
request to delete that file.
The storage.driver.s3.Delete
method is a generic method designed to support both direct (single file) and recursive (multiple files) deletion, which is why it always lists the objects in a given path before actually deleting them.
Considering the above, for each blob to delete, the garbage collector (GC) always sends two sequential HTTP requests to S3, one ListObjects
followed by a DeleteObjects
. That's 2N
network requests, where N
is the number of blobs to delete.
The worst-case scenario can, therefore, be described as 2N
requests per blob to delete, which is inefficient.
Rational
-
Considering that blobs are stored in a single file, named
data
, there is no need to list the contents of its parent folder before deleting it. We can delete thedata
file directly, cutting the required network requests in half, from2N
toN
. -
All blob storage backends (S3, GCS, Azure, OSS and Swift) behave in the same way: empty "folders" are automatically deleted. Therefore, it's enough to remove the
data
file; there is no need to worry about its parent folder. This is actually how the existingDelete
method of each storage driver works (S3 example); they list and delete the files within a path and do nothing about the path itself. However, the same is not valid for the local filesystem storage backend (see Caveats). -
We're not making use of the bulk delete feature for the S3
DeleteObjects
operation. This operation supports up to 1000 objects per request. Doing so, we would be able to go fromN
toceil(N/1000)
requests, which is a massive improvement. -
Additionally to the above, we can leverage on goroutines and spawn
DeleteRequests
concurrently for every 1000link
files to delete.
Expectations
Number of Requests
The current solution requires up to 2N
requests to delete N
blobs.
Theoretically, with the changes proposed in this MR, we can achieve a worst-case scenario of ceil(N/1000)
.
It's important to note that this doesn't only provide better performance. With S3, DeleteObjects
and ListObjects
requests have a price per unit. So cutting requests is also beneficial in terms of monetary cost.
Performance
We cannot get a performance improvement as big as the reduction in network requests. There is an inherent cost to every bulk delete request. S3 needs to delete all objects before returning a response, so we're "just" (understatement) avoiding the cost of building/parsing, serializing/deserializing and sending/receiving all those individual requests.
However, S3 should be able to parallelize the deletion of the objects within a bulk delete request on the server-side, so a bulk request should always be faster than multiple single sequential requests, even without accounting for the network trip time.
Overall, when compared to the current solution, we expect the performance improvement to be exponential. The gain should grow along with the number of blobs to delete, benefiting from the reduction of the required network requests and concurrency.
Caveats
No benefit comes without a cost. For all supported blob storage backends (S3, GCS, Azure, OSS and Swift) empty "folders" are automatically "deleted" (in reality, they don't exist, they're just an abstraction, a grouping key). Therefore we can delete the data
files directly and not worry about leaving an empty folder behind.
However, the above is not valid for the local filesystem and the in-memory (for test purposes only) storage backends. For the local filesystem, when the Delete
method is invoked with the path of the parent folder for a blob data
file, the folder is removed along with the file.
This caveat was tackled separately in !47 (merged).
Test Results
We ran a series of benchmarks to validate the expected improvements, both in terms of the number of network requests and the overall performance.
Methodology
Matrix
The following test matrix was used and allows us to simulate the deletion of N
blobs:
# blobs (N ) |
---|
1 |
10 |
100 |
1000 |
Scenarios
- Current implementation, using the tip of the
release-2.7-gitlab
branch at the time of writing; - This MR.
Notes
- For each execution we recorded the number of requests and the elapsed time (ns/op);
- For each blob, the benchmark creates and uploads a random image layer (~20MB) to S3. Therefore we decided to keep the number of blobs short, with a maximum of 1000 (20GB in total);
- Each benchmark was executed exactly once, due to the constraint described above.
Setup
Benchmarks ran against a Standard S3 bucket on the eu-central-1
region. The test registry was kept in a local development machine. The measured latency between the local machine and the AWS region was 48ms.
Benchmarks were executed using the following command:
go test -v github.com/docker/distribution/registry/storage/driver/s3-aws -args -check.b -check.f "DriverSuite.BenchmarkRemoveBlob*" -gocheck.btime 1ns
Results
Number of Requests
# of blobs (N ) |
Current version (# of requests) | This MR (# of requests) | # times less requests |
---|---|---|---|
1 | 2 | 1 | 2x |
10 | 20 | 1 | 20x |
100 | 200 | 1 | 200x |
1000 | 2000 | 1 | 2000x |
As observed in the table above, the reduction in the number of required network requests is outstanding. The results are aligned with the worst-case scenario of 2N
requests for the current behaviour, and Ceil(N/1000)
for the new proposed behaviour.
The results get better as the number of blobs to delete increases, as each DeleteObjects
request can include up to 1000 objects to delete.
We should keep in mind that every S3 request (except DELETE
requests, which we don't use, as DeleteObjects
are POST
) comes with a monetary cost.
Performance
# of blobs (N ) |
Current version (ns/op) | This MR (ns/op) | # times faster |
---|---|---|---|
1 | 700872944 | 686129724 | - |
10 | 3169730093 | 958745518 | 3x |
100 | 30290221138 | 2608343884 | 12x |
1000 | 264326869452 | 6248405859 | 42x |
These results show that for our test scenario, the run time can be up to 42 times faster than the current implementation. The improvement ratio should continue to develop as N
grows.
The performance improvement should be even more dramatic when deleting substancially more than 1000 blobs, as with !32 (merged) we implemented support for concurrent requests per each batch of 1000 blobs to delete.
Conclusions
The benchmarks seem to confirm the expectations regarding the number of requests and the overall speedup. As expected, nothing comes without a cost. The foreseeable caveats have been discussed and can be addressed to mitigate concerns.
Note: Further testing against a real large repository is required to validate the feasibility and effectiveness of this change. None of the approaches and results highlighted here are guaranteed until then. The final details will be described later in !24 (merged).