Gradually proxy requests for new repositories to the new registry
Problem
To support Phase 1 of the gradual migration proposal for large registries (starting with GitLab.com), we have implemented a feature to enable proxying requests that target new repositories to another registry (see #218 (closed)). Once this feature is enabled, all requests targeting new repositories will be routed to the new (internal, non-public-facing) registry backed by a metadata database.
Although this allows us to roll out the new registry gradually, starting with new repositories only, we have no way to control how many new repositories will be routed to the new registry at once. We also have no way to include/exclude specific namespaces (GitLab groups).
Control amount of repositories proxied to the new registry
Ideally, we would only route a percentage of new repositories to the new registry, starting with, e.g., 1%, increasing it gradually as we build confidence in the new registry and its database.
We could technically add a new configuration parameter, e.g., migration.proxy.percentage
, and have the existing registry filter proxied requests based on this percentage. For example, if configured to 1%, for every 100 requests (targeting a new repository), only 1 would be routed to the new registry, and the existing registry would serve the other 99.
However, for example in a push request, multiple layers can be uploaded concurrently to the same (new) repository, using separate requests. If the existing registry blindly filtered requests based on percentages, we could end up with some of those layers in the existing registry and others in the new one, leading to an inconsistent state and corrupted images as a result.
Temporarily ignore new repositories under specific namespaces
Additionally, it would be good to have a way to ensure that requests for new repositories under specific namespaces (GitLab groups) would not be proxied to the new registry until we're confident enough about its behavior. This would be useful to exclude repositories from customers with very high availability requirements from the initial go-live phase (where bugs are more likely to happen).
Similarly, we could reverse this pattern and pick specific groups for which all new repositories should go to the new registry.
Proposal
Instead of filtering proxied repositories by percentage, we could filter them by name. For example, we could start by only proxying requests for new repositories that start with a
. Then we could increase the scope by allowing repositories that start with b
as well, and so on.
This could be implemented with an include
regex configuration parameter:
migration:
proxy:
include:
- '^a.*' # Only proxy requests when the new repository name starts with `a`
- '^gitlab-org/.*' # OR the new repository is under the `gitlab-org` group
We could gradually increase the scope of new repositories proxied to the new registry with this in hand.
Similarly, we could filter out specific namespaces with an exclude
regex configuration parameter:
migration:
proxy:
exclude:
- '^group-a/.*' # Do not proxy requests when the new repository group is `group-a`
- '^group-b/.*' # Do not proxy requests when the new repository group is `group-b`
Caveats
-
We'll need to change the existing registry configuration many times to change inclusions/exclusions;
-
Filtering by name is not as predictable as with percentages. Repositories whose name starts with
a
might be way more than those that start withb
, so we need to be careful with inclusion rules. We can use the GitLab database to get a list of all existing repositories and group them by, e.g., the first letter, and then carefully choose inclusion rules based on that. If the least common start character isx
among all existing repositories, then we should start with that. This is not perfect, as we're trying to predict the name of new non-existing repositories based on the existing ones, but it's the best we can do; -
All new repositories that do not match an inclusion rule or that match at least one exclusion rule will be stored in the existing registry. This increases the number of repositories to be migrated in Phase 2, thus increasing the time needed to complete it.
Overall, I think these are fair tradeoffs. It's critical that we're able to control the amount of load in the new registry and, equally important, exclude new repositories from specific customers until we're confident enough that the new registry won't cause any harm to the uptime requirements that these have in regards to the registry.