Crawl storages from Rails to remove for stale repositories
GitLab's single source of truth for repositories that should or shouldn't exist is the Rails app's database. Through various failed operations, there may be leftovers on the storages which do not link to a project in Rails' database anymore. For Gitaly, this means leaving around stale repositories on the disks. For Gitaly Cluster, this means leaving around possibly stale database state and the replicas on the backing Gitaly storages.
We should have a crawler that goes through the storages configured in the Rails app and removes any repositories that do not link to any project. Having the crawler work for the storages in Rails handles the problem for both Gitalys and Gitaly Clusters.
Gitaly Cluster maintains the records in the database. It's possible that the internal storages of the cluster get out of sync with the database state. To handle this problem, there should be a separate crawler in Praefect that removes anything it doesn't expect to be present on the internal storages: #3719 (closed). This allows Praefect to maintain the illusion of being a single storage without exposing internals to the crawler described in this issue.
/cc @zj-gitlab @mjwood