Optimize the mark stage for GCS for Container Registry garbage collection
Problem to solve
For organizations that build and publish many Docker images to their Gitlab Container Registry, it is vital that they are able to easily and efficiently delete old, unused images from storage. The problem is that the garbage collection algorithm is inefficient and can take a long time to run.
There are two key stages to the process, the
mark stage, which identifies which images/tags can be deleted, and the
sweep stage, which deletes them. We recently optimized both of these stages for S3 and saw significant improvements. We need to do the same for Google cloud storage, so that we can unblock our customers utilizing GCS from running garbage collection and lowering their cost of storage.
This issue will focus on improving the
Identify and implement performance optimizations for the
mark stage of the garbage collection algorithm for GCS.
- Since gitlab.com utilizes GCS for Container Registry storage, we can run performance tests and benchmarks on dev.gitlab.com. This will help inform how we can scale to the production version of gitlab.com
Permissions and Security
- There are no permissions changes needed for this issue.
- There are no documentation changes needed for this issue.
Availability & Testing
What does success look like, and how can we measure that?
Success feels like
- We see similar performance gains to the optimizations we made for S3, in which we saw GC (for 15k blobs) go from 2 hours to 93 seconds.
- We enable our customers utilizing large amounts of storage on GCS to run GC and greatly reduce their storage costs.
- We utilize our learnings to better understand how to reduce storage costs for GitLab.com
- #38052 breaks out the metrics we would like to track for understanding usage and adoption of garbage collection
What is the type of buyer?
- This problem impacts our larger customers most, as they typically have many teams building many images.