Selective sync for replicated Docker registries

Summary

Customers using Geo to replicate very large Docker Repositories via Geo would like to be able to replicate only a subset of data to avoid copying terrabytes of data. The existing filtering mechanisms for Geo selective sync apply for Docker registries as well, but are not sufficient. We may need to consider adding other selection criteria

Problem to solve

Docker registries for customers can be massive, containing terrabytes of data. One customer reported using a docker registry with ~75TB of data. Syncing all of this data is costly, time consuming and may not be required.

Intended users

Geo users who have large Docker registries

Further details

Proposal

Implement additional docker registry selective sync rules, for example via tags and images
- This should likely be similar to existing selective sync this is you need to select what to sync
The selection criteria could use wildcards, similar to wildcards for protected branches
Add to Geo Administrator UI

I think we need to consider the scope and interplay between selective sync via project/shards and for this feature. Are they mutually exclusive for Docker Registries?

Permissions and Security

Documentation

This needs to be added to the documentation: https://docs.gitlab.com/ee/administration/geo/replication/configuration.html#selective-synchronization

Testing

What does success look like, and how can we measure that?

Geo users can decide which portion of a docker registry they want to sync to a secondary based

What is the type of buyer?

Premium
Ultimate

Links / references

Edited Aug 14, 2020 by 🤖 GitLab Bot 🤖