Discovery of forks on other instances of GitLab.com
Problem to solve
git is decentralized but it is hard to find where other public git instances for the same repository are. Centralized solutions like GitLab are popular also because they make discovery of forks easier and thus collaboration between people. But discovery of forks can be done also in a decentralized way between GitLab instances on different servers.
This is one of the component of federated cross-server merge requests. By knowing other forks and branches on other servers, we could then do cross-server merges requests once #32372 is implemented. But even before that having this information on "Graph" page of the repository could help users manually do pulls from forks on other servers.
I would propose that each instance of GitLab publishes into a shared DHT all public repositories it has. Each repository can be identified by the hash of the first commit, so DHT could contain as a key the hash and as a value the list of known URLs of repositories for that hash. When one wants to publish a new URL, it would first fetch the existing list, append the new one, and re-publish it. Similarly, any server discovering multiple lists for same hash would merge them and re-publish the new list.
For now we could display that under "Graph" page of the repository.
In the future, instances when they discover remote forks could then use API to fetch information about branches and commits and other information to display more then just existence of the fork.
Moreover, gitlab.com can serve as a known bootstrapping node for the DHT.
Permissions and Security
My proposal would be that for now we publish only public repositories into DHT.
Also I would propose opt-out from this publishing into DHT. Opt-out because public repositories can already be discovered through other means, like Google. DHT is just more efficient way to do this.