Create mechanism to invoke arbitrary commands on all repositories (visitor framework)
There exist many usecases we had in the past where we needed to process a subset or all repositories deterministically. This could be e.g. to:
- Derive insights into how specific data is layed out in repositories.
- Detect repositories that are vulnerable to a specific bug.
- Make sure that a specific migration has run on a repository.
Until now though, we are lacking a mechanism that performs this action deterministically for all repositories a Gitaly node is hosting.
We should implement a new Gitaly subcommand that performs this job. Given a list of repositories and a specific executable, the subcommand would run the executable for each of the repositories. The state will be tracked in a specific node-specific database like SQlite or whatever native Go database exists so that it is easy to see which repositories have already been processed, which are still outstanding, and whether the processing was successful for every repository. The usage would thus look similar to the following:
$ gitaly walker init --database=walker.db </path/to/repos.txt
$ gitaly walker exec 'git gc -C $REPO' --database=walker.db
By being executable-based we can achieve high flexibility where we can for example execute random scripts or Git commands. As repository migrations would typically be implemented as part of repository housekeeping, a simple subcommand gitaly optimize-repository
would be used to make sure that all preexisting repositories have been migrated.