Optimize the garbage collector sweep stage for S3
We're currently looking at optimizing the Container Registry garbage collection algorithm for S3 (gitlab#31071 (closed)).
In this issue we'll investigate possible improvements for the sweep stage.
Current Behaviour
The GC sweep stage starts by looking for a list of manifests and blobs that should be deleted. The manifests list is built during the mark stage (recursive path walk), and the blob list is made similarly, but during the sweep stage.
With the list of manifests and blobs to delete, the GC uses storage.Vacuum as an intermediate for the storage driver, removing objects from the backend.
Note: Optimizations around the path walk are being investigated in #3 (closed), so they're out of the scope of this issue. Here we focus exclusively on the delete operation.
Blobs
- The GC, in
storage.MarkAndSweep, iterates over the list of blobs to delete and callsstorage.Vacuum.RemoveBlobfor each one, passing the blob digest (e.g.sha256:f13f9a179e1c60cfe6b7a8b8b983443f43d9c49f81e1e449617234bec10407e6) as argument:- The
storage.Vacuum.RemoveBlobmethod parses the blob digest, generates its full path (e.g.docker/registry/v2/blobs/sha256/f1/f13f9a179e1c60cfe6b7a8b8b983443f43d9c49f81e1e449617234bec10407e6) and invokes the storage driverDeletemethod:- The
storage.driver.s3.Deletemethod will then do:- Send a
ListObjectsrequest to S3 to obtain the full list of all objects under a blob’s path. In this case, a blob’s path has a single object within, thedatafile. - With the list of objects in hand (a single
datafile), an S3DeleteObjectsrequest is built and sent. During the request, S3 will automatically delete the parent folderf13f9a179e1c60cfe6b7a8b8b983443f43d9c49f81e1e449617234bec10407e6, and its parentf1if empty.
- Send a
- The
- The
The process described above is synchronous and sequential. Thus, for each blob to delete, we send two HTTP requests to S3 (list and delete) and wait for the response before moving to the next one. So we have 2N network requests, where N is the number of blobs to delete.
Bottlenecks and Possible Solutions
- The
storage.driver.s3.Deletemethod is generic, i.e., it's designed to work with all kinds of objects. That's why aListObjectsrequest always precedes aDeleteObjectsrequest. For blob deletion, we don't need theListObjects, we could delete thedatafile directly. This would cut the required network requests in half. - We're not making use of the bulk capabilities of the S3
DeleteObjectsoperation when deleting blobs. It supports up to1000objects per request (source), which means that we could potentially send a single request instead of2Nin some scenarios. Again, thestorage.driver.s3.Deletemethod is generic. In the blob deletion scenario, it currently does aDeleteObjectsrequest with a single element (datafile) per blob. - If using bulk requests isn't viable or enough, we can parallelize the calls to
storage.Vacuum.RemoveBlobinstorage.MarkAndSweep, spawning a goroutine per blob deletion. - This is minor, but in
storage.MarkAndSweepa blob's digest is converted to a string just to be passed as an argument tostorage.Vacuum.RemoveBlob, where it's then parsed back to a digest. Thestorage.Vacuum.RemoveBlobmethod is only used in one more place, theproxy.NewRegistryPullThroughCachefunction, where the same thing happens (digest converted to a string). So we can avoid these conversions.
The improvements related to blob deletion are being implemented in !31 (merged).
Manifests
- In
storage.MarkAndSweep, the garbage collector iterates over the list of manifests to delete and callsstorage.Vacuum.RemoveManifestfor each one, passing the manifest name, digest and tag references as argument. This method will then do two things:- Iterate over the tag references list and:
- Generate the tag index directory full path (e.g.
docker/registry/v2/repositories/myorg/myproj/myimg/_manifests/tags/index/sha256/82d...057) - Send a
ListObjectsrequest to S3 to check if the tag index file exists (stat). If so, the storage driverDeletemethod is invoked to delete the tag reference. Thestorage.driver.s3.Deletemethod will then do:- Send a
ListObjectsrequest to S3 to obtain the list of objects under a tag index full path (secondListObjectsrequest for the same object). There is only one, the tag index link file (e.g.82d...057/link). - With the list of objects in hand (only one), an S3
DeleteObjectsrequest is built and sent.
- Send a
- Generate the tag index directory full path (e.g.
- Build the manifest revision directory full path (e.g.
docker/registry/v2/repositories/myorg/myproj/myimg/_manifests/revisions/sha256/82d...057)- Invoke the storage driver
Deletemethod to delete the manifest revision. Thestorage.driver.s3.Deletemethod will then do:- Send a
ListObjectsrequest to S3 to obtain the list of objects under a manifest revision full path. There is only one, the manifest revision link file (e.g.82d...057/link). - With the list of objects in hand (only one), an S3
DeleteObjectsrequest is built and sent.
- Send a
- Invoke the storage driver
- Iterate over the tag references list and:
The process described above is synchronous and sequential.
For each manifest to delete, the GC always sends at least two HTTP requests to S3 (ListObjects and DeleteObjects). That's 2N network requests, where N is the number of manifests to delete. On top of that, we can have another 2N requests (ListObjects and DeleteObjects) being made to delete tag index files (if any), where N is the number of tag index files for each manifest.
Bottlenecks and Possible Solutions
- The bottlenecks identified for blobs also apply to manifests.
- We can potentially bundle the deletion of manifest revision and tag index files in a single
DeleteObjectsrequest.
The improvements related to manifest deletion are being implemented in !39 (merged).