Skip to content

Bulk delete blobs during garbage collection for S3

Context

This change is part of !24 and related to #10 .

The current blob deletion process for the S3 storage backend is very inefficient.

Blobs are stored in a single file named data, and its full path looks like the following (hex digest truncated):

docker/registry/v2/blobs/sha256/ff/fff95...2d823/data

You can read more about the path specification here.

Current Behaviour

In the current implementation, the storage.driver.s3.Delete method is invoked with the path to the parent "folder" of each blob data file (.../fff95...2d823). This leads to a ListObjects request on that path, which only contains a single object, the data file, followed by a DeleteObjects request to delete that file.

The storage.driver.s3.Delete method is a generic method designed to support both direct (single file) and recursive (multiple files) deletion, which is why it always lists the objects in a given path before actually deleting them.

Considering the above, for each blob to delete, the garbage collector (GC) always sends two sequential HTTP requests to S3, one ListObjects followed by a DeleteObjects. That's 2N network requests, where N is the number of blobs to delete.

The worst-case scenario can, therefore, be described as 2N requests per blob to delete, which is inefficient.

Rational

  • Considering that blobs are stored in a single file, named data, there is no need to list the contents of its parent folder before deleting it. We can delete the data file directly, cutting the required network requests in half, from 2N to N.

  • All blob storage backends (S3, GCS, Azure, OSS and Swift) behave in the same way: empty "folders" are automatically deleted. Therefore, it's enough to remove the data file; there is no need to worry about its parent folder. This is actually how the existing Delete method of each storage driver works (S3 example); they list and delete the files within a path and do nothing about the path itself. However, the same is not valid for the local filesystem storage backend (see Caveats).

  • We're not making use of the bulk delete feature for the S3 DeleteObjects operation. This operation supports up to 1000 objects per request. Doing so, we would be able to go from N to ceil(N/1000) requests, which is a massive improvement.

  • Additionally to the above, we can leverage on goroutines and spawn DeleteRequests concurrently for every 1000 link files to delete.

Expectations

Number of Requests

The current solution requires up to 2N requests to delete N blobs.

Theoretically, with the changes proposed in this MR, we can achieve a worst-case scenario of ceil(N/1000).

It's important to note that this doesn't only provide better performance. With S3, DeleteObjects and ListObjects requests have a price per unit. So cutting requests is also beneficial in terms of monetary cost.

Performance

We cannot get a performance improvement as big as the reduction in network requests. There is an inherent cost to every bulk delete request. S3 needs to delete all objects before returning a response, so we're "just" (understatement) avoiding the cost of building/parsing, serializing/deserializing and sending/receiving all those individual requests.

However, S3 should be able to parallelize the deletion of the objects within a bulk delete request on the server-side, so a bulk request should always be faster than multiple single sequential requests, even without accounting for the network trip time.

Overall, when compared to the current solution, we expect the performance improvement to be exponential. The gain should grow along with the number of blobs to delete, benefiting from the reduction of the required network requests and concurrency.

Caveats

No benefit comes without a cost. For all supported blob storage backends (S3, GCS, Azure, OSS and Swift) empty "folders" are automatically "deleted" (in reality, they don't exist, they're just an abstraction, a grouping key). Therefore we can delete the data files directly and not worry about leaving an empty folder behind.

However, the above is not valid for the local filesystem and the in-memory (for test purposes only) storage backends. For the local filesystem, when the Delete method is invoked with the path of the parent folder for a blob data file, the folder is removed along with the file.

This caveat was tackled separately in !47 (merged).

Test Results

We ran a series of benchmarks to validate the expected improvements, both in terms of the number of network requests and the overall performance.

Methodology

Matrix

The following test matrix was used and allows us to simulate the deletion of N blobs:

# blobs (N)
1
10
100
1000

Scenarios

  • Current implementation, using the tip of the release-2.7-gitlab branch at the time of writing;
  • This MR.

Notes

  • For each execution we recorded the number of requests and the elapsed time (ns/op);
  • For each blob, the benchmark creates and uploads a random image layer (~20MB) to S3. Therefore we decided to keep the number of blobs short, with a maximum of 1000 (20GB in total);
  • Each benchmark was executed exactly once, due to the constraint described above.

Setup

Benchmarks ran against a Standard S3 bucket on the eu-central-1 region. The test registry was kept in a local development machine. The measured latency between the local machine and the AWS region was 48ms.

Benchmarks were executed using the following command:

go test -v github.com/docker/distribution/registry/storage/driver/s3-aws -args -check.b -check.f "DriverSuite.BenchmarkRemoveBlob*" -gocheck.btime 1ns

Results

Number of Requests

# of blobs (N) Current version (# of requests) This MR (# of requests) # times less requests
1 2 1 2x
10 20 1 20x
100 200 1 200x
1000 2000 1 2000x

As observed in the table above, the reduction in the number of required network requests is outstanding. The results are aligned with the worst-case scenario of 2N requests for the current behaviour, and Ceil(N/1000) for the new proposed behaviour.

The results get better as the number of blobs to delete increases, as each DeleteObjects request can include up to 1000 objects to delete.

We should keep in mind that every S3 request (except DELETE requests, which we don't use, as DeleteObjects are POST) comes with a monetary cost.

Performance

# of blobs (N) Current version (ns/op) This MR (ns/op) # times faster
1 700872944 686129724 -
10 3169730093 958745518 3x
100 30290221138 2608343884 12x
1000 264326869452 6248405859 42x

These results show that for our test scenario, the run time can be up to 42 times faster than the current implementation. The improvement ratio should continue to develop as N grows.

The performance improvement should be even more dramatic when deleting substancially more than 1000 blobs, as with !32 (merged) we implemented support for concurrent requests per each batch of 1000 blobs to delete.

Conclusions

The benchmarks seem to confirm the expectations regarding the number of requests and the overall speedup. As expected, nothing comes without a cost. The foreseeable caveats have been discussed and can be addressed to mitigate concerns.

Note: Further testing against a real large repository is required to validate the feasibility and effectiveness of this change. None of the approaches and results highlighted here are guaranteed until then. The final details will be described later in !24 (merged).

Edited by João Pereira

Merge request reports