Optimize the garbage collector sweep phase for GCS
This is related to gitlab#201963 (closed).
The garbage collection sweep phase has two main tasks: blob deletion and manifest deletion.
Here we leverage on the work done in !24 (merged), where we not only improved performance for S3 but also created a solid foundation to support improvements in other storage drivers.
Due to the above, for GCS, we only need to change a single driver function: DeleteFiles
. The existing DriverSuite.TestDeleteFiles*
driver-agnostic tests (also from !24 (merged)) are reused.
Blobs
Context
Blobs are stored in a single file named data
, and its full path looks like the following (hex digest truncated):
docker/registry/v2/blobs/sha256/ff/fff95...2d823/data
You can read more about the path specification here.
In the current implementation, the storage.driver.gcs.Delete
method is invoked for each blob to delete. In this method, a List request is sent to GCS to list the contents of a given path, and then a Delete request is sent to delete each object under that path.
storage.driver.gcs.Delete
is a generic method designed to support both direct and recursive deletion, which is why it always lists the objects in a given path before actually deleting them.
Considering the above, for each blob to delete, the garbage collector always sends two sequential HTTP requests to GCS, a List
followed by a Delete
. That's 2N
network requests, where N
is the number of blobs to delete.
The worst-case scenario can, therefore, be described as 2N
requests per blob to delete.
Proposal
Blobs are stored in a single file, named data
, so there is no need to list the contents of their path before deleting them. We can remove the data
files directly, cutting network requests in half, from 2N
to N
.
Instead of deleting blobs sequentially, we use goroutines to spawn concurrent Delete
requests. To avoid an unbounded number of goroutines (which can lead to high resource usage or being rate limited by GCS) we limit the number of concurrent requests.
From the GCS documentation: "There is no limit to writes across multiple objects, which include uploading, updating, and deleting objects. Buckets initially support roughly 1000 writes per second and then scale as needed" (source). Considering this, we limit the number of active concurrent Delete
requests to 1000. We have tested this limit by deleting 10000 objects with zero issues.
The GCS storage driver currently relies on a custom regulator to enforce a concurrency limit across all requests, but it has some flaws (#33). Considering this, here we decided to bypass the regulator and use a local semaphore in the DeleteFiles
method.
Benchmarks
We ran a series of benchmarks to validate the expected improvements.
Methodology
- We tested the deletion of
1
,10
,100
and1000
blobs; - We compared
v2.8.1-gitlab
against this MR; - For each test we recorded the elapsed time (ns/op);
- For each blob to delete, the benchmark creates and uploads a random image layer (~20MB) to GCS.
Setup
Benchmarks ran against a Standard GCS bucket on the europe-west1
region. The median latency between the development machine and the GCP region was 42ms.
Benchmarks were run using the following command:
go test -v -tags=include_gcs github.com/docker/distribution/registry/storage/driver/gcs -args -check.b -check.f "DriverSuite.BenchmarkRemoveBlobs*" -gocheck.btime 1ns
Results
# of blobs (N ) |
v2.8.1 (ns/op) | This MR (ns/op) | Improvement |
---|---|---|---|
1 | 474220379 | 339191277 | 28% |
10 | 2011813669 | 946950968 | 53% |
100 | 18148727744 | 7152357503 | 61% |
1000 | 193753453367 | 64053449582 | 67% |
These results show that for our test scenario, the run time can be up to 67% faster. The improvement should continue to increase as N
grows.
Manifests
Context
Manifest revisions and tag references are stored in a single file. While for blobs this file is named data
, for manifest revision and tag references it's called link
.
Sample manifest revision full path (hex digest truncated):
docker/registry/v2/repositories/myorg/myproj/myimg/_manifests/revisions/sha256/82d...057/link
Sample manifest tag reference full path:
docker/registry/v2/repositories/myorg/myproj/myimg/_manifests/tags/index/sha256/82d...057/link
You can read more about the path specification here.
Like for blobs, in the current implementation, the storage.driver.gcs.Delete
method is invoked for each manifest to delete. There is an unnecessary List request sent to GCS, followed by a Delete request for each manifest revision and tag references to delete. That's 2MT
network requests, where M
is the number of manifests to delete and T
is the number of tag references per manifest.
The worst-case scenario can, therefore, be described as 2MT
requests.
Proposal
The improvements proposed for blob deletion also apply to manifests, as they use the same DeleteFiles
, which we're changing with this MR. This allows us to go from 2MT
to MT
requests.
Benchmarks
We ran a series of benchmarks to validate the expected improvements.
Methodology
The following test matrix was used and allowed us to simulate the deletion of M
manifests with T
tag references each:
# manifests (M ) |
# tags per manifest (T ) |
---|---|
1 | 0 |
1 | 1 |
10 | 0 |
10 | 1 |
100 | 0 |
100 | 1 |
100 | 20 |
- We compared
v2.8.1-gitlab
against this MR; - For each test we recorded the elapsed time (ns/op);
- For each manifest, the benchmark creates and uploads a random test image (~40MB) to GCS.
Setup
The setup is the same used for blobs. Benchmarks were run using the following command:
go test -v -tags=include_gcs github.com/docker/distribution/registry/storage/driver/gcs -args -check.b -check.f "DriverSuite.BenchmarkRemoveManifests*" -gocheck.btime 1ns
Results
# manifests (M ) |
# tags per manifest (T ) |
v2.8.1 (ns/op) | This MR (ns/op) | Improvement |
---|---|---|---|---|
1 | 0 | 775504398 | 609772013 | 21% |
1 | 1 | 1039227908 | 783854142 | 25% |
10 | 0 | 5289274326 | 4087819040 | 23% |
10 | 1 | 7661681882 | 4516032792 | 41% |
100 | 0 | 64343605206 | 35716427792 | 44% |
100 | 1 | 89696142906 | 45270206885 | 50% |
100 | 20 | 437739607411 | 162902443774 | 63% |
These results show that for our test scenario, the run time can be up to 63% faster. The improvement increases along with M
and T
.
Overall Performance Improvement
To test the overall performance improvements we have created a sample repository, in the same way we did for S3 (please see https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/9033#note_285751258).
We have then run garbage collection, comparing v2.8.1-gitlab
against this MR.
Results
- Logs from
v2.8.1-gitlab
: before.log - Logs from this MR: after.log
Both runs started with the same object count: 1079 objects, 6.771 GBytes in size, and ended with the same object count as well: 688 objects, 2.528 GBytes in size.
The sweep stage took 49s for v2.8.1-gitlab
and only 0.26s for this MR, with 351 blobs and 20 manifests (including a tag per manifest) being successfully deleted. This means it's now 79 times faster, which translates to a performance improvement of 98.73%.
Conclusions
Unfortunately, we could not rely on bulk delete requests for GCS like we did for S3 (see gitlab#201963 (comment 285361834)), but the achieved performance improvements are still significant. These were only possible due to the foundations created in !24 (merged).