Prevent GCS Rate limiting Caused by Container Registry Phase 2 Migration
Context
In the near future, we'll be migrating the old container registry repositories to the new registry gitlab-org&6427 (closed)
Problem
This will generate a lot of traffic to the GCS bucket the registry on prod uses. In particular, this will involve a lot of writes compared to normal, as referenced blobs are copied to the new registry. Both the old and new registry depend on this bucket to serve API requests, so we're worried about balancing the speed of the migration with resource usage.
Proposal
Determine if it is possible to know what our current rate limits (they autoscale: https://cloud.google.com/storage/docs/request-rate) are for this bucket and/or work with someone from Google to ensure that we don't crowed out API requests with load caused by the migration?
Status
We have a specific metric in place to control the GCS rate (link) and another one to detect a rate limit event (link). The rate has been well under 5k/s (first soft limit advertised by Google), and there were no evident rate limit events. We're also running at full speed now, so none of these should get worse until the migration completes.
We have also kept the Google TAMs up to date about the progress. So there is nothing left to do here.