Selective sync for replicated Docker registries
Summary
Customers using Geo to replicate very large Docker Repositories via Geo would like to be able to replicate only a subset of data to avoid copying terrabytes of data. The existing filtering mechanisms for Geo selective sync apply for Docker registries as well, but are not sufficient. We may need to consider adding other selection criteria
Problem to solve
Docker registries for customers can be massive, containing terrabytes of data. One customer reported using a docker registry with ~75TB of data. Syncing all of this data is costly, time consuming and may not be required.
Intended users
- Geo users who have large Docker registries
Further details
Proposal
- Implement additional docker registry selective sync rules, for example via tags and images
- This should likely be similar to existing selective sync this is you need to select what to sync
- The selection criteria could use wildcards, similar to wildcards for protected branches
- Add to Geo Administrator UI
I think we need to consider the scope and interplay between selective sync via project/shards and for this feature. Are they mutually exclusive for Docker Registries?
Permissions and Security
Documentation
This needs to be added to the documentation: https://docs.gitlab.com/ee/administration/geo/replication/configuration.html#selective-synchronization
Testing
What does success look like, and how can we measure that?
- Geo users can decide which portion of a docker registry they want to sync to a secondary based
What is the type of buyer?
- Premium
- Ultimate